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<212> DNA 

<213> Artificial sequence 



<220> 

<223> Ancestral Hlv-1 group M, subtype B, env sequence 
<400> 1 



atgcgcgtga 


agggcatccg 


caagaactac 


cagcacctgt 


ggcgctgggg 


caccatgctg 


60 


ctggggatgc 


tgatgatctg 


ctccgcggcc 


gagaagctgt 


gggtgaccgt 


gtactacggc 


120 


gtgcccgtgt 


ggaaggaggc 


caccaccacc 


ctgttctgcg 


ccagcgacgc 


caaggcttac 


180 


gacaccgagg 


tccacaacgt 


gtgggccacc 


cacgcctgcg 


tgcccaccga 


ccccaacccc 


240 


caggaggtgg 


tgctggagaa 


cgtgaccgag 


aacttcaaca 


tgtggaagaa 


caacatggtg 


300 


gagcagatgc 


acgaggacat 


catcagcctg 


tgggaccaga 


gcctgaagcc 


ctgcgtgaag 


360 


ttaacccccc 


tgtgcgtgac 


cctgaactgc 


accgacgacc 


tgcgcaccaa 


cgccaccaac 


420 


accaccaaca 


gcagcgccac 


caccaacacc 


accagcagcg 


gcggcggcac 


gatggagggc 


480 


gagaagggcg 


agatcaagaa 


ctgcagcttc 


aacgtgacca 


ccagcatccg 


cgacaagatg 


540 


cagaaggagt 


acgccctgtt 


ctacaagctg 


gacgtggtgc 


ccatcgacaa 


cgacaacaac 


600 


aacaccaaca 


acaacaccag 


ctaccgcctc 


atcaactgca 


acaccagcgt 


gatcacccag 


660 


gcctgcccca 


aggtgagctt 


cgagcccatc 


cccatccact 


actgcacccc 


cgccggcttc 


720 


gccatcctga 


agtgcaacga 


caagaagttc 


aacggcaccg 


gcccctgcac 


caacgtgagc 


780 


accgtgcagt 


gcacccacgg 


catccgcccc 


gtggtgagca 


cccagctgct 


gctgaacggc 


840 


agcctggccg 


aggaggaggt 


ggtgatccgc 


agcgagaact 


tcaccgacaa 


cgccaagacc 


900 


atcatcgtgc 


agctgaacga 


gagcgtggag 


atcaactgca 


cgcgtcccaa 


caacaacacc 


960 


cgcaagagca 


tccccatcgg 


ccctggccgc 


gccctgtacg 


ccaccggcaa 


gatcatcggc 


1020 


gacatccgcc 


aggcccactg 


caacctgtcg 


cgagccaagt 


ggaacaacac 


cctgaagcag 


1080 


atcgtgacca 


agctgcgcga 


gcagttcggc 


aacaacaaga 


ccaccatcgt 


gttcaaccag 


1140 


agcagcggcg 


gcgaccccga 


gatcgtgatg 


cacagcttca 


actgcggcgg 


cgaattcttc 


1200 


tactgcaaca 


gcacccagct 


gttcaacagc 


acctggcact 


tcaacggcac 


ctggggcaac 


1260 


aacaacaccg 


agcgcagcaa 


caacgccgcc 


gacgacaacg 


acaccatcac 


cctgccctgc 


1320 


cgcatcaagc 


agatcatcaa 


catgtggcag 


gaggtgggca 


aggccatgta 


cgcccccccc 


1380 


atcagcggcc 


agatccgctg 


cagcagcaac 


atcaccggcc 


tgctgctgac 


tcgagacggc 


1440 


ggcaacaacg 


agaacaccaa 


caacaccgac 


accgagatct 


tccgccccgg 


gggcggcgac 


1500 


atgcgcgaca 


actggcgcag 


cgagctgtac 


aagtacaagg 


tggtgaagat 


cgagcccctg 


1560 


ggcgtggccc 


ccaccaaggc 


caagcgccgc 


gtggtgcagc 


gcgagaagcg 


cgccgtgggc 


1620 


atgctgggcg 


ccatgttcct 


gggcttcctg 


ggcgccgccg 


gcagcaccat 


gggcgccgcc 


1680 


agcatgaccc 


tgaccgtgca 


ggcccgccag 


ctgctgagcg 
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gcatcgtgca 


gcagcagaac 


1740 
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aacctgctgc 


gcgccatcga 


ggcccagcag 


cacctgctgc 


agctgaccgt 


gtggggcatc 


1800 


aagcagctgc 


aggcccgcgt 


gctggccgtg 


gagcggtacc 


tgaaggacca 


gcagctgctg 


1860 


ggcatctggg 


gctgcagcgg 


caagctgatc 


tgcaccaccg 


cggtgccctg 


gaacgccagc 


1920 


tggagcaaca 


agagcctgga 


caagatctgg 


aacaacatga 


cctggatgga 


gtgggagcgc 


1980 


gagatcgaca 


actacaccgg 


cctgatctac 


accctgatcg 


aggagagcca 


gaaccagcag 


2040 


gagaagaacg 


agcaggagct 


gctggagctg 


gacaagtggg 


ccagcctgtg 


gaactggttc 


2100 


gatatcacca 


actggctgtg 


gtacatcaag 


atcttcatca 


tgatcgtggg 


cggcctggtg 


2160 


ggcctgcgca 


tcgtgttcgc 


cgtgctgagc 


atcgtgaacc 


gcgtgcgcca 


gggctacagc 


2220 


cccctgagct 


tccagacccg 


cctgcccgcc 


ccccgcggcc 


ccgaccgccc 


cgagggcatc 


2280 


gaggaggagg 


gcggcgagcg 


cgaccgcgac 


cgcagcgggc 


gcctggtgaa 


cggcttcctg 


2340 


gccctgatct 


gggacgacct 


gcgcagcctg 


tgcctgttca 


gctaccaccg 


cctgcgcgac 


2400 


ctgctgctga 


tcgtggcccg 


catcgtggag 


ctgctgggcc 


ggcgcggctg 


ggaggccctg 


2460 


aagtattggt 


ggaacctgct 


gcagtactgg 


agccaggagc 


tgaagaacag 


cgccgtgagc 


2520 


ctgctgaacg 


ccaccgccat 


cgccgtggcc 


gagggcaccg 


accgcgtgat 


cgaggtggtg 


2580 


cagcgcgcct 


gccgcgccat 


cctgcacatc 


ccccgccgca 


tccgccaggg 


cctggagcgc 


2640 


gccctgctgt 


ga 










2652 



<2l6> 2 
<211> 883 
<212> PRT 

<213> Artificial sequence 
<220> 

<223> Ancestral Hlv-1 group M, subtype B, env sequence 
<400> 2 

Met Arg Val Lys Gly lie Arg Lys Asn Tyr Gin His Leu Trp Arg Trp 
15 10 15 

Gly Thr Met Leu Leu Gly Met Leu Met lie Cys Ser Ala Ala Glu Lys 
20 25 30 

Leu Trp val Thr val Tyr Tyr Gly val Pro val Trp Lys Glu Ala Thr 
35 40 45 

Thr Thr Leu Phe Cys Ala Ser Asp Ala Lys Ala Tyr Asp Thr Glu val 
50 55 60 

His Asn val Trp Ala Thr His Ala cys val Pro Thr Asp Pro Asn Pro 
65 70 75 80 
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Gin Glu val val Leu Glu Asn val Thr Glu Asn Phe Asn Met Trp Lys 
85 90 95 

Asn Asn Met val Glu Gin Met His Glu Asp lie lie ser Leu xrp Asp 
100 105 110 

Gin Ser Leu Lys Pro Cys val Lys Leu Thr Pro Leu Cys Val Thr Leu 
115 120 125 

Asn Cys Thr Asp Asp Leu Arg Thr Asn Ala Thr Asn Thr Thr Asn Ser 
130 135 140 

ser Ala Thr Thr Asn Thr Thr ser ser Gly Gly Gly Thr Met Glu Gly 

145 150 155 160 

Glu Lys Gly Glu lie Lys Asn cys ser Phe Asn val Thr Thr Ser lie 
165 170 175 

Arg Asp Lys Met Gin Lys Glu Tyr Ala Leu Phe Tyr Lys Leu Asp val 
180 185 190 

Val Pro lie Asp Asn Asp Asn Asn Asn Thr Asn Asn Asn Thr ser Tyr 
195 200 205 

Arg Leu lie Asn Cys Asn Thr ser val lie Thr Gin Ala Cys Pro Lys 
210 215 220 

val Ser Phe Glu Pro lie Pro lie His Tyr Cys Thr Pro Ala Gly Phe 

225 230 235 240 

Ala lie Leu Lys Cys Asn Asp Lys Lys Phe Asn Gly Thr Gly Pro Cys 
245 250 255 

Thr Asn val ser Thr val Gin cys Thr His Gly lie Arg Pro val val 
260 265 270 

Ser Thr Gin Leu Leu Leu Asn Gly ser Leu Ala Glu Glu Glu Val Val 
275 280 285 

lie Arg ser Glu Asn Phe Thr Asp Asn Ala Lys Thr lie lie val Gin 
290 295 300 

Leu Asn Glu Ser val Glu lie Asn Cys Thr Arg Pro Asn Asn Asn Thr 

305 310 315 320 

Arg Lys Ser lie Pro lie Gly Pro Gly Arg Ala Leu Tyr Ala Thr Gly 
325 330 335 

Lys lie lie Gly Asp lie Arg Gin Ala His Cys Asn Leu Ser Arg Ala 
340 345 350 
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Lys Trp Asn Asn Thr Leu Lys Gin lie val Thr Lys Leu Arg Glu Gin 
355 360 365 

Phe Gly Asn Asn Lys Thr Thr lie val Phe Asn Gin Ser Ser Gly Gly 
370 375 380 

Asp Pro Glu lie val Met His ser Phe Asn Cys Gly Gly Glu Phe Phe 
385 390 395 400 

Tyr Cys Asn Ser Thr Gin Leu Phe Asn ser Thr Trp His Phe Asn Gly 
405 410 415 

Thr Trp Gly Asn Asn Asn Thr Glu Arg Ser Asn Asn Ala Ala Asp Asp 
420 425 430 

Asn Asp Thr lie Thr Leu Pro cys Arg lie Lys Gin lie lie Asn Met 
435 440 445 

Trp Gin Glu val Gly Lys Ala Met Tyr Ala Pro Pro lie Ser Gly Gin 
450 455 460 

lie Arg Cys Ser Ser Asn lie Thr Gly Leu Leu Leu Thr Arg Asp Gly 
465 470 475 480 

Gly Asn Asn Glu Asn Thr Asn Asn Thr Asp Thr Glu lie Phe Arg Pro 
485 490 495 

Gly Gly Gly Asp Met Arg Asp Asn Trp Arg ser Glu Leu Tyr Lys Tyr 
500 505 510 

Lys val val Lys lie Glu Pro Leu Gly Val Ala Pro Thr Lys Ala Lys 
515 520 525 

Arg Arg val Val Gin Arg Glu Lys Arg Ala val Gly Met Leu Gly Ala 
530 535 540 

Met Phe Leu Gly Phe Leu Gly Ala Ala Gly ser Thr Met Gly Ala Ala 
545 550 555 560 

Ser Met Thr Leu Thr Val Gin Ala Arg Gin Leu Leu Ser Gly lie val 
565 570 575 

Gin Gin Gin Asn Asn Leu Leu Arg Ala lie Glu Ala Gin Gin His Leu 
580 585 590 

Leu Gin Leu Thr val Trp Gly lie Lys Gin Leu Gin Ala Arg val Leu 
595 600 605 

Ala Val Glu Arg Tyr Leu Lys Asp Gin Gin Leu Leu Gly He Trp Gly 
610 615 620 
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Cys Ser Gly Lys Leu lie Cys Thr Thr Ala val Pro Trp Asn Ala Ser 
625 630 635 640 

Trp Ser Asn Lys Ser Leu Asp Lys lie Trp Asn Asn Met Thr Trp Met 
645 650 655 

Glu Trp Glu Arg Glu lie Asp Asn Tyr Thr Gly Leu lie Tyr Thr Leu 
660 665 670 

lie Glu Glu Ser Gin Asn Gin Gin Glu Lys Asn Glu Gin Glu Leu Leu 
675 680 685 

Glu Leu Asp Lys Trp Ala Ser Leu Trp Asn Trp Phe Asp lie Thr Asn 
690 695 700 

Trp Leu Trp Tyr lie Lys lie Phe lie Met lie val Gly Gly Leu val 
705 710 715 720 

Gly Leu Arg lie val Phe Ala Val Leu ser lie Val Asn Arg val Arg 
725 730 735 

Gin Gly Tyr ser Pro Leu ser Phe Gin Thr Arg Leu Pro Ala Pro Arg 
740 745 750 

Gly Pro Asp Arg Pro Glu Gly lie Glu Glu Glu Gly Gly Glu Arg Asp 
755 760 765 

Arg Asp Arg Ser Gly Arg Leu val Asn Gly Phe Leu Ala Leu lie Trp 
770 775 780 

Asp Asp Leu Arg ser Leu Cys Leu Phe ser Tyr His Arg Leu Arg Asp 
785 790 795 800 

Leu Leu Leu lie val Ala Arg lie val Glu Leu Leu Gly Arg Arg Gly 
805 810 815 

Trp Glu Ala Leu Lys Tyr Trp Trp Asn Leu Leu Gin Tyr Trp Ser Gin 
820 825 830 

Glu Leu Lys Asn ser Ala val ser Leu Leu Asn Ala Thr Ala lie Ala 
835 840 845 

val Ala Glu Gly Thr Asp Arg val lie Glu val val Gin Arg Ala Cys 
850 855 860 

Arg Ala lie Leu His lie Pro Arg Arg lie Arg Gin Gly Leu Glu Arg 
865 870 875 880 

Ala Leu Leu 
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<210> 3 

<211> 2562 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Ancestral Hlv-1 group M, subtype C, env sequence 



<400> 3 
atgcgggtga 


tgggcatcct 


gcggaactgc 


cagcagtggt 


ggatctgggg 


catcctgggc 


60 


ttctggatgc 


tgatgatctg 


cagcgtgatg 


ggcaacctgt 


gggtgaccgt 


gtactacggc 


120 


gtgcccgtgt 


ggaaggaggc 


caagaccacc 


ctgttctgcg 


ccagcgacgc 


caaggcctac 


180 


gagcgggagg 


tgcacaacgt 


gtgggccacc 


cacgcctgcg 


tgcccaccga 


ccccaacccc 


240 


caggagatgg 


tgctggagaa 


cgtgaccgag 


aacttcaaca 


tgtggaagaa 


cgacatggtg 


300 


gaccagatgc 


acgaggacat 


catcagcctg 


tgggaccaga 


gcctgaagcc 


ctgcgtgaag 


360 


ctgacccccc 


tgtgcgtgac 


cctgaactgc 


accaacgtga 


ccaacaccaa 


caacaacaac 


420 


aacaccagca 


tgggcggcga 


gatcaagaac 


tgcagcttca 


acatcaccac 


cgagctgcgg 


480 


gacaagaagc 


agaaggtgta 


cgccctgttc 


taccggctgg 


acatcgtgcc 


cctgaacgag 


540 


aacagcaaca 


gcaacagcag 


cgagtaccgg 


ctgatcaact 


gcaacaccag 


cgccatcacc 


600 


caggcctgcc 


ccaaggtgag 


cttcgacccc 


atccccatcc 


actactgcgc 


ccccgccggc 


660 


tacgccatcc 


tgaagtgcaa 


caacaagacc 


ttcaacggca 


ccggcccctg 


caacaacgtg 


720 


agcaccgtgc 


agtgcaccca 


cggcatcaag 


cccgtggtga 


gcacccagct 


gctgctgaac 


780 


ggcagcctgg 


ccgaggagga 


gatcatcatc 


cggagcgaga 


acctgaccaa 


caacgccaag 


840 


accatcatcg 


tgcacctgaa 


cgagagcgtg 


gagatcgtgt 


gcacccggcc 


caacaacaac 


900 


acccggaaga 


gcatccggat 


cggccccggc 


cagaccttct 


acgccaccgg 


cgacatcatc 


960 


ggcgacatcc 


ggcaggccca 


ctgcaacatc 


agcgagaagg 


agtggaacaa 


gaccctgcag 


1020 


cgggtgggca 


agaagctgaa 


ggagcacttc 


cccaacaaga 


ccatcaagtt 


cgagcccagc 


1080 


agcggcggcg 


acctggagat 


caccacccac 


agcttcaact 


gccggggcga 


gttcttctac 


1140 


tgcaacacca 


gcaagctgtt 


caacagcacc 


tacaacagca 


ccaacaacgg 


caccaccagc 


1200 


aacagcacca 


tcaccctgcc 


ctgccggatc 


aagcagatca 


tcaacatgtg 


gcagggcgtg 


1260 


ggccgggcca 


tgtacgcccc 


ccccatcgcc 


ggcaacatca 


cctgcaagag 


caacatcacc 


1320 


ggcctgctgc 


tgacccggga 


cggcggcaac 


accaacaaca 


ccaccgagac 


cttccggccc 


1380 


ggcggcggcg 


acatgcggga 


caactggcgg 


agcgagctgt 


acaagtacaa 


ggtggtggag 


1440 


atcaagcccc 


tgggcgtggc 


ccccaccgag 


gccaagcggc 


gggtggtgga 


gcgggagaag 


1500 


cgggccgtgg 


gcatcggcgc 


cgtgttcctg 


ggcttcctgg 
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gcgccgccgg 


cagcaccatg 


1560 
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ggcgccgcca 


gcatcaccct 


gaccgtgcag 


gcccggcagc 


tgctgagcgg 


catcgtgcag 


1620 


cagcagagca 


acctgctgcg 


ggccatcgag 


gcccagcagc 


acatgctgca 


gctgaccgtg 


1680 


tggggcatca 


agcagctgca 


gacccgggtg 


ctggccatcg 


agcggtacct 


gaaggaccag 


1740 


cagctgctgg 


gcatctgggg 


ctgcagcggc 


aagctgatct 


gcaccaccgc 


cgtgccctgg 


1800 


aacagcagct 


ggagcaacaa 


gagccaggac 


gacatctggg 


acaacatgac 


ctggatgcag 


1860 


tgggaccggg 


agatcagcaa 


ctacaccgac 


accatctacc 


ggctgctgga 


ggacagccag 


1920 


aaccagcagg 


agaagaacga 


gaaggacctg 


ctggccctgg 


acagctggaa 


gaacctgtgg 


1980 


aactggttcg 


acatcaccaa 


ctggctgtgg 


tacatcaaga 


tcttcatcat 


gatcgtgggc 


2040 


ggcctgatcg 


gcctgcggat 


catcttcgcc 


gtgctgagca 


tcgtgaaccg 


ggtgcggcag 


2100 


ggctacagcc 


ccctgagctt 


ccagaccctg 


acccccaacc 


cccggggccc 


cgaccggctg 


2160 


ggcggcatcg 


aggaggaggg 


cggcgagcag 


gaccgggacc 


ggagcatccg 


gctggtgagc 


2220 


ggcttcctgg 


ccctggcctg 


ggacgacctg 


cggagcctgt 


gcctgttcag 


ctaccaccgg 


2280 


ctgcgggact 


tcatcctgat 


cgccgcccgg 


ggcgtgaacc 


tgctgggccg 


gagcagcctg 


2340 


cggggcctgc 


agcggggctg 


ggaggccctg 


aagtacctgg 


gcagcctggt 


gcagtactgg 


2400 


ggcctggagc 


tgaagaagag 


cgccatcagc 


ctgctggaca 


ccatcgccat 


cgccgtggcc 


2460 


gagggcaccg 


accggatcat 


cgagctggtg 


cagcggatct 


gccgggccat 


ccggaacatc 


2520 


ccccggcgga 


tccggcaggg 


cttcgaggcc 


gccctgcagt 


ga 




2562 



<210> 4 

<211> 853 

<212> PRT 

<213> Artificial sequence 



<220> 

<223> Ancestral HlV-1 group M, subtype C, env sequence. 
<400> 4 

Met Arg val Met Gly lie Leu Arg Asn Cys Gin Gin Trp Trp lie Trp 
15 10 15 

Gly lie Leu Gly Phe Trp Met Leu Met lie cys Ser val Met Gly Asn 
20 25 30 

Leu Trp val Thr val Tyr Tyr Gly val Pro val Trp Lys Glu Ala Lys 
35 40 45 

Thr Thr Leu Phe Cys Ala Ser Asp Ala Lys Ala Tyr Glu Arg Glu val 
50 55 60 
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His Asn val Trp Ala Thr His Ala Cys Val Pro Thr Asp Pro Asn Pro 
65 70 75 80 

Gin Glu Met Val Leu Glu Asn val Thr Glu Asn Phe Asn Met Trp Lys 
85 90 95 

Asn Asp Met val Asp Gin Met His Glu Asp lie lie Ser Leu Trp Asp 
100 105 110 

Gin Ser Leu Lys Pro Cys val Lys Leu Thr Pro Leu Cys val Thr Leu 
115 120 125 

Asn Cys Thr Asn val Thr Asn Thr Asn Asn Asn Asn Asn Thr ser Met 
130 135 140 

Gly Gly Glu lie Lys Asn Cys Ser Phe Asn lie Thr Thr Glu Leu Ar 
145 150 155 161 

Asp Lys Lys Gin Lys Val Tyr Ala Leu Phe Tyr Arg Leu Asp lie val 
165 170 175 

Pro Leu Asn Glu Asn Ser Asn Ser Asn Ser Ser Glu Tyr Arg Leu lie 
180 185 190 

Asn Cys Asn Thr Ser Ala lie Thr Gin Ala Cys Pro Lys val Ser Phe 
195 200 205 

Asp Pro lie Pro lie His Tyr cys Ala Pro Ala Gly Tyr Ala lie Leu 
210 215 220 

Lys cys Asn Asn Lys Thr Phe Asn Gly Thr Gly Pro Cys Asn Asn val 
225 230 235 240 

ser Thr val Gin Cys Thr His Gly lie Lys Pro val val Ser Thr Gin 
245 250 255 

Leu Leu Leu Asn Gly ser Leu Ala Glu Glu Glu lie lie lie Arg ser 
260 265 270 

Glu Asn Leu Thr Asn Asn Ala Lys Thr lie lie val His Leu Asn Glu 
275 280 285 

Ser val Glu lie val Cys Thr Arg Pro Asn Asn Asn Thr Arg Lys ser 
290 295 300 

He Arg He Gly Pro Gly Gin Thr Phe Tyr Ala Thr Gly Asp He lie 
305 310 315 320 

Gly Asp lie Arg Gin Ala His Cys Asn lie Ser Glu Lys Glu Trp Asn 
325 330 335 
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Lys Thr Leu Gin Arg val Gly Lys Lys Leu Lys Glu His Phe Pro Asn 
340 345 350 

Lys Thr lie Lys Phe Glu Pro Ser Ser Gly Gly Asp Leu Glu lie Thr 
355 360 365 

Thr His ser Phe Asn cys Arg Gly Glu Phe Phe Tyr cys Asn Thr ser 
370 375 380 

Lys Leu Phe Asn ser Thr Tyr Asn Ser Thr Asn Asn Gly Thr Thr Ser 
385 390 395 400 

Asn Ser Thr lie Thr Leu Pro Cys Arg lie Lys Gin lie lie Asn Met 
405 410 415 

Trp Gin Gly val Gly Arg Ala Met Tyr Ala Pro Pro lie Ala Gly Asn 
420 425 430 

lie Thr Cys Lys Ser Asn lie Thr Gly Leu Leu Leu Thr Arg Asp Gly 
435 440 445 

Gly Asn Thr Asn Asn Thr Thr Glu Thr Phe Arg Pro Gly Gly Gly Asp 
450 455 460 

Met Arg Asp Asn Trp Arg ser Glu Leu Tyr Lys Tyr Lys val val Glu 
465 470 475 480 

lie Lys Pro Leu Gly val Ala Pro Thr Glu Ala Lys Arg Arg Val val 
485 490 495 

Glu Arg Glu Lys Arg Ala val Gly lie Gly Ala Val Phe Leu Gly Phe 
500 505 510 

Leu Gly Ala Ala Gly Ser Thr Met Gly Ala Ala Ser lie Thr Leu Thr 
515 520 525 

val Gin Ala Arg Gin Leu Leu Ser Gly lie val Gin Gin Gin ser Asn 
530 535 540 

Leu Leu Arg Ala lie Glu Ala Gin Gin His Met Leu Gin Leu Thr val 
545 550 555 560 

Trp Gly lie Lys Gin Leu Gin Thr Arg Val Leu Ala lie Glu Arg Tyr 
565 570 575 

Leu Lys Asp Gin Gin Leu Leu Gly lie Trp Gly Cys Ser Gly Lys Leu 
580 585 590 

lie Cys Thr Thr Ala val Pro Trp Asn Ser ser Trp Ser Asn Lys Ser 
595 600 605 
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Gin Asp Asp lie Trp Asp Asn Met Thr Trp Met Gin Trp Asp Arg Glu 
610 615 620 

lie ser Asn Tyr Thr Asp Thr lie Tyr Arg Leu Leu Glu Asp Ser Gin 
625 630 635 640 

Asn Gin Gin Glu Lys Asn Glu Lys Asp Leu Leu Ala Leu Asp Ser Trp 
645 650 655 

Lys Asn Leu Trp Asn Trp Phe Asp lie Thr Asn Trp Leu Trp Tyr lie 
660 665 670 

Lys lie Phe lie Met He val Gly Gly Leu He Gly Leu Arg He He 
675 680 685 

Phe Ala Val Leu Ser lie val Asn Arg val Arg Gin Gly Tyr Ser Pro 
690 695 700 

Leu Ser Phe Gin Thr Leu Thr Pro Asn Pro Arg Gly Pro Asp Arg Leu 
705 710 715 720 

Gly Gly lie Glu Glu Glu Gly Gly Glu Gin Asp Arg Asp Arg ser He 
725 730 735 

Arg Leu val ser Gly Phe Leu Ala Leu Ala Trp Asp Asp Leu Arg ser 
740 745 750 

Leu Cys Leu Phe Ser Tyr His Arg Leu Arg Asp Phe lie Leu lie Ala 

755 760 765 

Ala Arg Gly val Asn Leu Leu Gly Arg Ser Ser Leu Arg Gly Leu Gin 
770 775 780 

Arg Gly Trp Glu Ala Leu Lys Tyr Leu Gly ser Leu Val Gin Tyr Trp 
785 790 795 800 

Gly Leu Glu Leu Lys Lys Ser Ala lie Ser Leu Leu Asp Thr lie Ala 
805 810 815 

lie Ala Val Ala Glu Gly Thr Asp Arg lie lie Glu Leu val Gin Arg 
820 825 830 

lie Cys Arg Ala lie Arg Asn lie Pro Arg Arg ile Arg Gin Gly Phe 
835 840 845 

Glu Ala Ala Leu Gin 
850 

<210> 5 
<211> 2652 
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<212> DNA 

<213> Artificial sequence 
<220> 

<223> Semi -optimized ancestral viral sequences for Hlv-1 subtypes B and 
c 



<400> 5 
atgagagtga 


aggggatcag 


gaagaactat 


cagcacttgt 


ggagatgggg 


caccatgctc 


60 


cttgggatgt 


tgatgatctg 


tagcgccgcc 


gagaagctgt 


gggtgaccgt 


gtactacggc 


120 


gtgcccgtgt 


ggaaggaggc 


caccaccacc 


ctgttctgcg 


ccagcgacgc 


caaggcttac 


180 


gacaccgagg 


tccacaacgt 


gtgggccacc 


cacgcctgcg 


tgcccaccga 


ccccaacccc 


240 


caggaggtgg 


tgctggagaa 


cgtgaccgag 


aacttcaaca 


tgtggaagaa 


caacatggtg 


300 


gagcagatgc 


acgaggacat 


catcagcctg 


tgggaccaga 


gcctgaagcc 


ctgcgtgaag 


360 


ttaacccccc 


tgtgcgtgac 


cctgaactgc 


accgacgacc 


tgcgcaccaa 


cgccaccaac 


420 


accaccaaca 


gcagcgccac 


caccaacacc 


accagcagcg 


gcggcggcac 


gatggagggc 


480 


gagaagggcg 


agatcaagaa 


ctgcagcttc 


aacgtgacca 


ccagcatccg 


cgacaagatg 


540 


cagaaggagt 


acgccctgtt 


ctacaagctg 


gacgtggtgc 


ccatcgacaa 


cgacaacaac 


600 


aacaccaaca 


acaacaccag 


ctaccgcctc 


atcaactgca 


acaccagcgt 


gatcacccag 


660 


gcctgcccca 


aggtgagctt 


cgagcccatc 


cccatccact 


actgcacccc 


cgccggcttc 


720 


gccatcctga 


agtgcaacga 


caagaagttc 


aacggcaccg 


gcccctgcac 


caacgtgagc 


780 


accgtgcagt 


gcacccacgg 


catccgcccc 


gtggtgagca 


cccagctgct 


gctgaacggc 


840 


agcctggccg 


aggaggaggt 


ggtgatccgc 


agcgagaact 


tcaccgacaa 


cgccaagacc 


900 


atcatcgtgc 


agctgaacga 


gagcgtggag 


atcaactgca 


cgcgtcccaa 


caacaacacc 


960 


cgcaagagca 


tccccatcgg 


ccctggccgc 


gccctgtacg 


ccaccggcaa 


gatcatcggc 


1020 


gacatccgcc 


aggcccactg 


caacctgtcg 


cgagccaagt 


ggaacaacac 


cctgaagcag 


1080 


atcgtgacca 


agctgcgcga 


gcagttcggc 


aacaacaaga 


ccaccatcgt 


gttcaaccag 


1140 


agcagcggcg 


gcgaccccga 


gatcgtgatg 


cacagcttca 


actgcggcgg 


cgaattcttc 


1200 


tactgcaaca 


gcacccagct 


gttcaacagc 


acctggcact 


tcaacggcac 


ctggggcaac 


1260 


aacaacaccg 


agcgcagcaa 


caacgccgcc 


gacgacaacg 


acaccatcac 


cctgccctgc 


1320 


cgcatcaagc 


agatcatcaa 


catgtggcag 


gaggtgggca 


aggccatgta 


cgcccccccc 


1380 


atcagcggcc 


agatccgctg 


cagcagcaac 


atcaccggcc 


tgctgctgac 


tcgagacggc 


1440 


ggcaacaacg 


agaacaccaa 


caacaccgac 


accgagatct 


tccgccccgg 


gggcggcgac 


1500 


atgcgcgaca 


actggcgcag 


cgagctgtac 


aagtacaagg 


tggtgaagat 


cgagcccctg 


1560 


ggcgtagcac 


ccaccaaggc 


aaagagaaga 


gtggtgcaga 


gagaaaaaag 


cgcagtggga 


1620 


atgctaggag 


ctatgttcct 


tgggttcttg 


ggagcagcag 


gaagcactat 


gggcgcagcg 


1680 
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tcaatgacgc 


tgaccgtaca 


ggccagacaa 


ttattgtctg 


gtatagtgca 


gcagcagaac 


1740 


aatctgctga 


gggctattga 


ggcgcaacag 


catctgttgc 


aactcacagt 


ctggggcatc 


1800 


aagcagctcc 


aggcaagagt 


cctggctgtg 


gaaagatacc 


taaaggatca 


gcagctcctg 


1860 


gggatttggg 


gttgctctgg 


aaaactcatc 


tgcaccactg 


ctgtgccttg 


gaatgctagc 


1920 


tggagcaaca 


agagcctgga 


caagatctgg 


aacaacatga 


cctggatgga 


gtgggagcgc 


1980 


gagatcgaca 


actacaccgg 


cctgatctac 


accctgatcg 


aggagagcca 


gaaccagcag 


2040 


gagaagaacg 


agcaggagct 


gctggagctg 


gacaagtggg 


ccagcctgtg 


gaactggttc 


2100 


gatatcacca 


actggctgtg 


gtacatcaag 


atcttcatca 


tgatcgtggg 


cggcctggtg 


2160 


ggcctgcgca 


tcgtgttcgc 


cgtgctgagc 


atcgtgaacc 


gcgtgcgcca 


gggctacagc 


2220 


cccctgagct 


tccagaccca 


cctgccagcc 


ccgaggggac 


ccgacaggcc 


cgaaggaatc 


2280 


gaagaagaag 


gtggagagag 


agacagagac 


agatccggtc 


gattagtgaa 


tggattctta 


2340 


gcacttatct 


gggacgacct 


gcggagcctg 


tgcctcttca 


gctaccaccg 


cttgagcgac 


2400 


ttactcttga 


ttqtaqcqaq 


gattgtggaa 


cttctqqqac 


qcaqqqqqtq 


qqaqqccctc 


2460 


aaatattggt 


ggaatctcct 


gcagtactgg 


agtcaggaac 


taaagaatag 


cgccgtgagc 


2520 


ctgctgaacg 


ccaccgccat 


cgccgtggcc 


gagggcaccg 


accgcgtgat 


cgaggtggtg 


2580 


cagcgcgcct 


gccgcgccat 


cctgcacatc 


ccccgccgca 


tccgccaggg 


cctggagcgc 


2640 


gccctgctgt 


ga 










2652 



<210> 6 

<211> 2562 

<212> DNA 

<213> Artificial sequence 



<220> 

<223> Semi -optimized ancestral viraT sequences for Hlv-1 subtypes B and 
c 



<400> 6 
atgagagtga 


tggggatact 


gaggaattgt 


caacaatggt 


ggatatgggg 


catcctaggc 


60 


ttttggatgc 


taatgatttg 


tgacgtgatg 


ggcaacctgt 


gggtgaccgt 


gtactacggc 


120 


gtgcccgtgt 


ggaaggaggc 


caagaccacc 


ctgttctgcg 


ccagcgacgc 


caaggcctac 


180 


gagcgggagg 


tgcacaacgt 


gtgggccacc 


cacgcctgcg 


tgcccaccga 


ccccaacccc 


240 


caggagatgg 


tgctggagaa 


cgtgaccgag 


aacttcaaca 


tgtggaagaa 


cgacatggtg 


300 


gaccagatgc 


acgaggacat 


catcagcctg 


tgggaccaga 


gcctgaagcc 


ctgcgtgaag 


360 


ctgacccccc 


tgtgcgtgac 


cctgaactgc 


accaacgtga 


ccaacaccaa 


caacaacaac 


420 


aacaccagca 


tgggcggcga 


gatcaagaac 


tgcagcttca 


acatcaccac 


cgagctgcgg 


480 


gacaagaagc 


agaaggtgta 


cgccctgttc 


taccggctgg acatcgtgcc 
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cctgaacgag 


540 
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aacagcaaca 


gcaacagcag 


cgagtaccgg 


ctgatcaact 


gcaacaccag 


cgccatcacc 


600 


caggcctgcc 


ccaaggtgag 


cttcgacccc 


atccccatcc 


actactgcgc 


ccccgccggc 


660 


tacgccatcc 


tgaagtgcaa 


caacaagacc 


ttcaacggca 


ccggcccctg 


caacaacgtg 


720 


agcaccgtgc 


agtgcaccca 


cggcatcaag 


cccgtggtga 


gcacccagct 


gctgctgaac 


780 


ggcagcctgg 


ccgaggagga 


gatcatcatc 


cggagcgaga 


acctgaccaa 


caacgccaag 


840 


accatcatcg 


tgcacctgaa 


cgagagcgtg 


gagatcgtgt 


gcacccggcc 


caacaacaac 


900 


acccggaaga 


gcatccggat 


cggccccggc 


cagaccttct 


acgccaccgg 


cgacatcatc 


960 


ggcgacatcc 


ggcaggccca 


ctgcaacatc 


agcgagaagg 


agtggaacaa 


gaccctgcag 


1020 


cgggtgggca 


agaagctgaa 


ggagcacttc 


cccaacaaga 


ccatcaagtt 


cgagcccagc 


1080 


agcggcggcg 


acctggagat 


caccacccac 


agcttcaact 


gccggggcga 


gttcttctac 


1140 


tgcaacacca 


gcaagctgtt 


caacagcacc 


tacaacagca 


ccaacaacgg 


caccaccagc 


1200 


aacagcacca 


tcaccctgcc 


ctgccggatc 


aagcagatca 


tcaacatgtg 


gcagggcgtg 


1260 


ggccgggcca 


tgtacgcccc 


ccccatcgcc 


ggcaacatca 


cctgcaagag 


caacatcacc 


1320 


ggcctgctgc 


tgacccggga 


cggcggcaac 


accaacaaca 


ccaccgagac 


cttccggccc 


1380 


ggcggcggcg 


acatgcggga 


caactggcgg 


agcgagctgt 


acaagtacaa 


ggtggtggag 


1440 


atcaagcccc 


tgggcgtagc 


acccactgag 


gcaaaaagga 


gagtggtgga 


gagagaaaaa 


1500 


agagcagtgg 


gaataggagc 


tgtgttcctt 


gggttcttgg 


gagcagcagg 


aagcactatg 


1560 


ggcgcggcgt 


caataacgct 


gacggtacag 


gccagacaat 


tattgtctgg 


tatagtgcaa 


1620 


cagcaaagca 


atttgctgag 


ggctatagag 


gcgcaacagc 


atatgttgca 


actcacggtc 


1680 


tggggcatta 


agcagctcca 


gacaagagtc 


ctggctatag 


aaagatacct 


aaaggatcag 


1740 


cagctcctgg 


gcatttgggg 


ctgctctgga 


aaactcatct 


gcaccactgc 


tgtgccttgg 


1800 


aactctagct 


ggagcaacaa 


gagccaggac 


gacatctggg 


acaacatgac 


ctggatgcag 


1860 


tgggaccggg 


agatcagcaa 


ctacaccgac 


accatctacc 


ggctgctgga 


ggacagccag 


1920 


aaccagcagg 


agaagaacga 


gaaggacctg 


ctggccctgg 


acagctggaa 


gaacctgtgg 


1980 


aactggttcg 


acatcaccaa 


ctggctgtgg 


tacatcaaga 


tcttcatcat 


gatcgtgggc 


2040 


ggcctgatcg 


gcctgcggat 


catcttcgcc 


gtgctgagca 


tcgtgaaccg 


ggtgcggcag 


2100 


ggctacagcc 


ccctgagctt 


ccagaccctt 


accccaaacc 


cgaggggacc 


cgacaggctc 


2160 


ggaggaatcg 


aagaagaagg 


tggagagcaa 


gacagagaca 


gatccattcg 


attagtgagc 


2220 


ggattcttag 


cactggcctg 


ggacgacctg 


cggagcctgt 


gcctcttcag 


ctaccaccga 


2280 


ttgagagact 


tcatattgat 


tgcagccaga 


gggtgggaac 


ttctgggacg 


cagcagtctc 


2340 


aggggactgc 


agagggggtg 


ggaagccctt 


aagtatctgg 


gaagtcttgt 


gcagtattgg 


2400 


ggtctggagc 


taaaaaagag 


tgctattagc 


ctgctggaca 


ccatcgccat 


cgccgtggcc 


2460 


gagggcaccg 


accggatcat 


cgagctggtg 


cagcggatct 


gccgggccat 


ccggaacatc 


2520 


ccccggcgga 


tccggcaggg 


cttcgaggcc 


gccctgcagt ga 
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<210> 7 

<211> 7 

<212> DNA 

<213> Artificial sequence 



<220> 

<223> Consensus sequence-maximum likelihood reconstruction of determine 
d ancestral node. 

<400> 7 

gatcctg 7 

<210> 8 

<211> 7 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> consensus sequence, most parsimonious reconstruction of determine 
d ancestral node 

<220> 

<221> variation 

<222> (3) . . (3) 

<223> w can be an A or T 



<400> 8 
gawcctg 

<210> 9 

<211> 7 

<212> DNA 

<213> Artificial sequence 



<220> 

<223> consensus sequence, maximum likelihood reconstruction of determin 
ed ancestral node. 

<400> 9 

gaacctg 7 
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<210> 10 

<211> 7 

<212> DNA 

<213> Artificial sequence 



<220> 

<223> Consensus sequence, maximum 
ed ancestral node. 

<400> 10 
gaaactc 

<210> 11 

<211> 7 

<212> DNA 

<213> Artificial sequence 



likelihood reconstruction of determin 

7 



<220> 

<223> Consensus sequence, maximum likelihood reconstruction of determin 
ed ancestral node. 

<400> 11 

gatactc 7 

<210> 12 

<211> 7 

<212> DNA 

<213> Artificial sequence 



<220> 

<223> Consensus sequence, most parsimonious reconstruction of determine 
d ancestral node. 

<220> 

<221> variation 

<222> (3). .(3) 

<223> w can be an A or T 



<400> 12 
gawactc 
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<210> 13 
<211> 7 
<212> DNA 

<213> Artificial sequence 



<220> 

<223> Consensus sequence, maximum 
ed ancestral node. 

<400> 13 
catactc 

<210> 14 

<211> 7 

<212> DNA 

<213> Artificial sequence 



likelihood reconstruction of determin 



7 



<220> 

<223> consensus sequence, maximum likelihood reconstruction of determin 
ed ancestral node. 

<400> 14 

catactt 7 

<210> 15 

<211> 7 

<212> DNA 

<213> Artificial sequence 



<220> 

<223> consensus sequence, maximum likelihood reconstruction of determin 
ed ancestral node. 

<400> 15 

catacta 7 



<210> 16 

<211> 7 

<212> DNA 

<213> Artificial sequence 



<220> 
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<223> Consensus sequence, maximum likelihood reconstruction of determin 
ed ancestral node. 

<400> 16 

catattg 7 

<210> 17 

<211> 7 

<212> DNA 

<213> Artificial sequence 



<220> 

<223> consensus sequence, most parsimonious reconstruction of determine 
d ancestral node. 

<220> 

<221> variation 

<222> C7) . - C7) 

<223> V can also be an A, c or G 



<400> 17 
catactv 



<210> 18 

<211> 7 

<212> DNA 

<213> Artificial sequence 



<220> 

<223> consensus sequence, maximum 
ed ancestral node. 

<400> 18 
catgctg 

<210> 19 

<211> 7 

<212> DNA 

<213> Artificial sequence 



likelihood reconstruction of determin 

7 



<220> 
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<223> Consensus sequence, maximum likelihood reconstruction of determin 
ed ancestral node. 

<400> 19 

catactg 7 



<210> 20 

<211> 7 

<212> DNA 

<213> Artificial sequence 



<220> 

<223> Consensus sequence, maximum likelihood reconstruction of determin 
ed ancestral node. 

<400> 20 

caagctg 7 



<210> 21 

<211> 7 

<212> DNA 

<213> Artificial sequence 



<220> 

<223> consensus sequence, maximum 
ed ancestral node, 

<400> 21 
catgctg 

<210> 22 

<211> 7 

<212> DNA 

<213> Artificial sequence 



likelihood reconstruction of determin 

7 



<220> 

<223> Consensus sequence, maximum likelihood reconstruction of determin 
ed ancestral node. 

<400> 22 

cttgctg 7 



<210> 23 
<211> 7 
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<212> DNA 

<213> Artificial sequence 



<220> 

<223> Consensus sequence, maximum likelihood reconstruction of determin 
ed ancestral node. 

<400> 23 

cttgctt 7 

<210> 24 

<211> 1503 

<212> DNA 

<213> Artificial sequence 



<220> 

<223> Most recent common ancestor of reconstruction clade B gag gene se 
quence 



<400> 24 
atgggtgcga 


gagcgtcagt 


attaagcggg 


ggagaattag 


ataaatggga 


aaaaattcgg 


60 


ttacggccag 


ggggaaagaa 


aaaatataaa 


ttaaaacata 


tagtatgggc 


aagcagggag 


120 


ctagaacgat 


tcgcagttaa 


tcctggcctt 


ttagaaacat 


cagaaggctg 


tagacaaata 


180 


ctgggacagc 


tacaaccatc 


ccttcagaca 


ggatcagaag 


aacttagatc 


attatataat 


240 


acagtagcag 


tcctctattg 


tgtgcatcaa 


aagatagagg 


taaaagacac 


caaggaagct 


300 


ttagataaga 


tagaggaaga 


gcaaaacaaa 


agtaagaaaa 


aggcacagca 


agcagcagct 


360 


gacacaggaa 


acagcagcca 


ggtcagccaa 


aattacccta 


tagtgcagaa 


cctacagggg 


420 


caaatggtac 


atcaggccct 


atcacctaga 


actttaaatg 


catgggtaaa 


agtaatagaa 


480 


gagaaggctt 


tcagcccaga 


agtaataccc 


atgttttcag 


cattatcaga 


aggagccacc 


540 


ccacaagatt 


taaacaccat 


gctaaacaca 


gtggggggac 


atcaagcagc 


catgcaaatg 


600 


ttaaaagaga 


ccatcaatga 


ggaagctgca 


gaatgggata 


gattgcatcc 


agtgcatgca 


660 


gggcctattg 


caccaggcca 


gatgagagaa 


ccaaggggaa 


gtgacatagc 


aggaactact 


720 


agtacccttc 


aggaacaaat 


agcatggatg 


acaaataatc 


cacctatccc 


agtaggagaa 


780 


atctataaaa 


gatggataat 


cctgggatta 


aataaaatag 


taagaatgta 


tagccctgtc 


840 


agcattctgg 


acataagaca 


aggaccaaag 


gaacccttta 


gagactatgt 


agaccggttc 


900 


tataaaactc 


taagagccga 


gcaagcttca 


caggaggtaa 


aaaattggat 


gacagaaacc 


960 


ttgttggtcc 


aaaatgcgaa 


cccagattgt 


aagactatct 


taaaagcatt 


gggaccagga 


1020 


gctacactag 


aagaaatgat 


gacagcatgt 


cagggagtgg 


ggggacccgg 


ccataaagca 


1080 


agagttttgg 


ctgaagcaat 


gagccaagta 


acaaattcag 


ctaccataat 


gatgcagaga 


1140 
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ggcaatttta ggaacccaag aaagactgtt aagtgtttca attgtggcaa agaagggcac 1200 

atagccagaa attgcagggc ccctaggaaa aagggctgtt ggaaatgtgg aaaggaagga 1260 

caccaaatga aagattgtac tgagagacag gctaattttt tagggaaaat ctggccttcc 1320 

cacaagggaa ggccagggaa ttttcttcag agcagaccag agccaacagc cccaccagaa 1380 

gagagcttca ggtttgggga agagacaaca actccctctc agaagcagga gcagaaagac 1440 

aaggaactgt atcctttagc ttccctcaaa tcactctttg gcaacgaccc ctcgtcacaa 1500 

taa 1503 

<210> 25 

<211> 1503 

<212> DNA 

<213> Artificial sequence 



<220> 



<223> Least squares center of tree reconstruction of dade B gag gene s 
equence 



<400> 25 
atgggtgcga 


gagcgtcagt 


attaagcggg 


ggagaattag 


atagatggga 


aaaaattcgg 


60 


ttaaggccag 


ggggaaagaa 


aaaatataga 


ttaaaacata 


tagtatgggc 


aagcagggag 


120 


ctagaacgat 


tcgcagttaa 


tcctggcctg 


ttagaaacat 


cagaaggctg 


tagacaaata 


180 


ctgggacagc 


tacaaccatc 


ccttcagaca 


ggatcagaag 


aacttagatc 


attatataat 


240 


acagtagcaa 


ccctctattg 


tgtgcatcaa 


aggatagagg 


taaaagacac 


caaggaagct 


300 


ttagagaaga 


tagaggaaga 


gcaaaacaaa 


agtaagaaaa 


aggcacagca 


agcagcagct 


360 


gacacaggaa 


acagcagcca 


ggtcagccaa 


aattacccta 


tagtgcagaa 


cctccagggg 


420 


caaatggtac 


atcaggccat 


atcacctaga 


actttaaatg 


catgggtaaa 


agtagtagag 


480 


gagaaggctt 


tcagcccaga 


agtaataccc 


atgttttcag 


cattatcaga 


aggagccacc 


540 


ccacaagatt 


taaacaccat 


gctaaacaca 


gtggggggac 


atcaagcagc 


catgcaaatg 


600 


ttaaaagaga 


ccatcaatga 


ggaagctgca 


gaatgggata 


gattgcatcc 


agtgcatgca 


660 


gggcctattg 


caccaggcca 


gatgagagaa 


ccaaggggaa 


gtgacatagc 


aggaactact 


720 


agtacccttc 


aggaacaaat 


aggatggatg 


acaaataatc 


cacctatccc 


agtaggagaa 


780 


atctataaaa 


gatggataat 


cctgggatta 


aataaaatag 


taagaatgta 


tagccctacc 


840 


agcattctgg 


acataagaca 


aggaccaaag 


gaacccttta 


gagactatgt 


agaccggttc 


900 


tataaaactc 


taagagccga 


gcaagcttca 


caggaggtaa 


aaaattggat 


gacagaaacc 


960 


ttgttggtcc 


aaaatgcgaa 


cccagattgt 


aagactattt 


taaaagcatt 


gggaccagca 


1020 


gctacactag 


aagaaatgat 


gacagcatgt 


cagggagtgg 


ggggacccgg 


ccataaagca 


1080 
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agagttttgg ctgaagcaat gagccaagta acaaattcag ctaccataat gatgcagaga 1140 

ggcaatttta ggaaccaaag aaagactgtt aagtgtttca attgtggcaa agaagggcac 1200 

atagccaaaa attgcagggc ccctaggaaa aagggctgtt ggaaatgtgg aaaggaagga 1260 

caccaaatga aagattgtac tgagagacag gctaattttt tagggaagat ctggccttcc 1320 

cacaagggaa ggccagggaa ttttcttcag agcagaccag agccaacagc cccaccagaa 1380 

gagagcttca ggtttgggga agagacaaca actccctctc agaagcagga gccgatagac 1440 

aaggaactgt atcctttagc ttccctcaga tcactctttg gcaacgaccc ctcgtcacaa 1500 

taa 1503 

<210> 26 
<211> 1503 
<212> DNA 

<213> Artificial sequence 



<220> 



<223> Minimum of means center of tree reconstruction of dade B gag gen 
e sequence 



<400> 26 
atgggtgcgg 


gagcgtcggt 


attaagcggg 


ggaaaattag 


ataggtggga 


aaaaattcgg 


60 


ttaaggccag 


ggggaaagaa 


aaaatataaa 


ttaaaacata 


tagtatgggc 


aagcagggag 


120 


ctagaacgat 


ttgcagtcaa 


tcctggcctg 


ttagaaacat 


cagaaggctg 


cagacgaata 


180 


ctggaacagc 


tacatccatc 


ccttcagaca 


ggatcagaag 


aacttaaatc 


attatataat 


240 


acggtagcaa 


ccctctattg 


tgtgcatcaa 


aatatagagg 


taagagacac 


caaggatgct 


300 


ttagaaaaaa 


tagaggaaga 


acaaaacaaa 


attaagaaaa 


gggcacagca 


agcagcagct 


360 


gacacaggaa 


acagcaaccc 


ggtcagccaa 


aattacccta 


tagtgcagaa 


tatgcagggg 


420 


caaatggtac 


atcaggccat 


atcacctaga 


actttaaatg 


catgggtaaa 


agtagtagaa 


480 


gagaaggctt 


tcagccccga 


agtaataccc 


atgttttcag 


cattatcaga 


aggagccacc 


540 


ccacaagatt 


taaacaccat 


gctaaacaca 


gtggggggac 


atcaagcagc 


catgcaaatg 


600 


ttaaaagaaa 


ccatcaatga 


ggaagctgca 


gaatgggata 


gattgcaccc 


agtgcatgca 


660 


gggcctattg 


caccaggcca 


gatgagagaa 


ccaaggggaa 


gtgacatagc 


aggaactact 


720 


agtacccttc 


aggaacaaat 


aggatggatg 


acacataatc 


cacctatccc 


agtaggagaa 


780 


atctataaaa 


gatggataat 


catgggatta 


aataaaatag 


taagaatgta 


tagccctacc 


840 


agcattctgg 


acataagaca 


aggaccaaag 


gaacccttta 


gagattatgt 


tgaccggttc 


900 


tataaaactc 


taagagccga 


gcaagcttca 


caggaggtaa 


aaaattggat 


gacagaaacc 


960 


ttgttggtcc 


aaaatgcgaa 


cccagattgt 


aagaccattt 


taaaagcatt 


aggaccagca 


1020 


gctacactag 


aagaaatgat 


gacagcatgt 


cagggagtgg gagggcccag 
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ccataaagca 
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agagttttgg cagaagcaat gagccaagca acaaattcag ctaccataat gatgcagagg 1140 

ggcaatttta agggccaaag aaagactgtt aaatgtttca attgtggcaa agaagggcac 1200 

atagccagaa attgcagggc ccctagaaaa aagggctgtt ggaaatgtgg aaaggaagga 1260 

caccaaatga aagattgtac tgagagacag gctaattttt tagggaagat ctggccttcc 1320 

cacaagggaa ggccagggaa ttttctccaa agcaggccag agccaacagc cccaccagaa 1380 

gagagcttca ggtttgggga ggagacaaca actccccctc agaagcagga gccgagggac 1440 

aaggaacagt atcccttgac ttccctcaga tcactctttg gcaacgaccc atcgtcacaa 1500 

taa 1503 

<210> 27 

<211> 2589 

<212> DNA 

<213> Artificial sequence 



<220> 



<223> Most recent common ancestor reconstruction of dade B env gene se 
quence 



<400> 27 
atgagagtga 


aggggatcag 


gaagaattgt 


cagcacttgt 


ggaaatgggg 


caccatgctc 


60 


cttgggatgt 


tgatgatctg 


tagtgctgca 


gaaaacttgt 


gggtcacagt 


ctattatggg 


120 


gtacctgtgt 


ggaaagaagc 


aaccaccact 


ctattttgtg 


catcagatgc 


taaagcatat 


180 


aaaacagagg 


tacataatgt 


ctgggccaca 


catgcctgtg 


tacccacaga 


ccccaaccca 


240 


caagaagtag 


tattggaaaa 


tgtgacagaa 


aattttaaca 


tgtggaaaaa 


taacatggta 


300 


gaacagatgc 


atgaggatat 


aatcagttta 


tgggatcaaa 


gcctaaagcc 


atgtgtaaaa 


360 


ttaaccccac 


tctgtgttac 


tttaaattgc 


actgatgcga 


acaagaatgc 


tactaatacc 


420 


aatagtagta 


gtgggggaac 


aatggagaaa 


ggagaaatga 


aaaactgctc 


tttcaatatc 


480 


accacaagca 


taagagataa 


gatgcagaaa 


gaatatgcac 


ttttttataa 


acttgatgta 


540 


gtaccaatag 


ataatgataa 


taatagtaat 


aataatacca 


actataggtt 


gataaattgt 


600 


aatacctcag 


tcattacaca 


ggcctgtcca 


aaggtatcct 


ttgagccaat 


tcccatacat 


660 


tattgtaccc 


cggctggttt 


tgcgattcta 


aagtgtaatg 


ataagaagtt 


caatggaaca 


720 


ggaccatgta 


aaaatgtcag 


cacagtacaa 


tgtacacatg 


gaattaggcc 


agtagtgtca 


780 


actcaactgc 


tgttaaatgg 


cagtctagca 


gaagaagagg 


tagtaattag 


atctgaaaat 


840 


ttcacggaca 


atgctaaaac 


cataatagta 


cagctgaatg 


aatctgtaga 


aattaattgt 


900 


acaagaccca 


acaacaatac 


aagaaaaagt 


atacctatag 


gaccagggag 


agcactttat 


960 


acaacaggag 


aaataatagg 


agatataaga 


caagcacatt 


gtaacattag 


tagagcaaaa 


1020 
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tggaataaca 


ctttaaaaca 


ggtagttaca 


aaattaagag 


aacaatttgg 


gaataataaa 


1080 


acaatagtct 


ttaatccatc 


ctcaggaggg 


gacccagaaa 


ttgtaatgca 


cagttttaat 


1140 


tgtggagggg 


aatttttcta 


ctgtaataca 


acacaactgt 


ttaatagtac 


ttggaatagt 


1200 


actgaagggt 


caaataaaac 


tacagggtca 


aataacactg 


gaggagaaac 


tatcacactc 


1260 


ccatgcagaa 


taaaacaaat 


tataaacatg 


tggcaggaag 


taggaaaagc 


aatgtatgcc 


1320 


cctcccatca 


gaggacaaat 


taaatgttca 


tcaaatatta 


cagggctact 


attaacaaga 


1380 


gatggtggtg 


aaaatagtac 


caatgagacc 


gagatcttca 


gacctggagg 


aggagatatg 


1440 


agggacaatt 


ggagaagtga 


attatataaa 


tataaagtag 


taaaaattga 


accattagga 


1500 


gtagcaccca 


ccaaggcaaa 


gagaagagtg 


gtgcaaagag 


aaaaaagagc 


agtgggaata 


1560 


ataggagcta 


tgttccttgg 


gttcttggga 


gcagcaggaa 


gcactatggg 


cgcagcgtca 


1620 


atgacgctga 


cggtacaggc 


cagacaatta 


ttgtctggta 


tagtgcaaca 


gcaaaacaat 


1680 


ttgctgaggg 


ctattgaggc 


gcaacagcat 


ctgttgcaac 


tcacggtctg 


gggcatcaaa 


1740 


cagctccagg 


caagagtcct 


ggctgtggaa 


agatacctaa 


gggatcaaca 


gctcctagga 


1800 


atttggggtt 


gctctggaaa 


actcatttgc 


accactactg 


tgccttggaa 


tgctagttgg 


1860 


agtaataaat 


ctctggataa 


gatttggaat 


aacatgacct 


ggatggagtg 


ggaaagagaa 


1920 


attgacaatt 


acacaggctt 


aatatacaac 


ttaattgaag 


aatcgcagaa 


ccagcaagaa 


1980 


aagaatgaac 


aagaattatt 


ggaattggat 


aagtgggcaa 


gtttgtggaa 


ttggtttgac 


2040 


ataacacaat 


ggctgtggta 


tataaaaata 


ttcataatga 


tagtaggagg 


cttggtaggt 


2100 


ttaagaatag 


tttttgctgt 


gctttctata 


gtgaatagag 


ttaggcaggg 


atactcacca 


2160 


ttatcatttc 


agacccgcct 


cccagccccg 


aggggacccg 


acaggcccga 


aggaatcgaa 


2220 


gaagaaggtg 


gagagagaga 


cagagacaga 


tccggtcgat 


tagtgaatgg 


attcttagca 


2280 


cttatctggg 


acgatctgcg 


gagcctgtgc 


ctcttcagct 


accaccgctt 


gagagactta 


2340 


ctcttgattg 


tagcgaggat 


tgtggaactt 


ctgggacgca 


QQQQQtggga 


agccctcaaa 


2400 


tattggtgga 


atctcctgca 


gtattggagt 


caggaactaa 


agaatagtgc 


tgttagcttg 


2460 


cttaatgcca 


cagcaatagc 


agtagctgag 


gggacagata 


gggttataga 


agtagtacaa 


2520 


agagcttgta 


gagctattct 


tcacatacct 


agaagaataa 


gacagggctt 


agaaagggct 


2580 


ttgctataa 












2589 



<210> 28 

<211> 2589 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Least square center of tree and minimum of means center of tree r 
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econst ruction of dade B env gene sequence 



<400> 28 
atgagagtga 


aggggatcag 


gaagaattat 


cagcacttgt 


ggagatgggg 


caccatgctc 


60 


cttgggatgt 


tgatgatctg 


tagtgctgca 


gaaaaattgt 


gggtcacagt 


ctattatggg 


120 


gtacctgtgt 


ggaaagaagc 


aaccaccact 


ctattttgtg 


catcagatgc 


taaagcatat 


180 


gatacagagg 


tacataatgt 


ttgggccaca 


catgcctgtg 


tacccacaga 


ccccaaccca 


240 


caagaagtag 


tattggaaaa 


tgtgacagaa 


aattttaaca 


tgtggaaaaa 


taacatggta 


300 


gaacagatgc 


atgaggatat 


aatcagttta 


tgggatcaaa 


gcctaaagcc 


atgtgtaaaa 


360 


ttaaccccac 


tctgtgttac 


tttaaattgc 


actgatttga 


ataagaatgc 


tactaatacc 


420 


aatagtagta 


gcggggaaat 


gatggagaaa 


ggagaaataa 


aaaactgctc 


tttcaatatc 


480 


accacaagca 


taagagataa 


ggtgcagaaa 


gaatatgcac 


ttttttataa 


acttgatgta 


540 


gtaccaatag 


ataatgataa 


taatactaat 


aatactacca 


gctataggtt 


gataagttgt 


600 


aacacctcag 


tcattacaca 


ggcctgtcca 


aaggtatcct 


ttgagccaat 


tcccatacat 


660 


tattgtgccc 


cggctggttt 


tgcgattcta 


aagtgtaatg 


ataagaagtt 


caatggaaca 


720 


ggaccatgta 


caaatgtcag 


cacagtacaa 


tgtacacatg 


gaattaggcc 


agtagtatca 


780 


actcaactgc 


tgttaaatgg 


cagtctagca 


gaagaagagg 


tagtaattag 


atctgacaat 


840 


ttcacggaca 


atgctaaaac 


cataatagta 


cagctgaatg 


aatctgtaga 


aattaattgt 


900 


acaagaccca 


acaacaatac 


aagaaaaagt 


atacatatag 


gaccagggag 


agcattttat 


960 


acaacaggag 


aaataatagg 


agatataaga 


caagcacatt 


gtaacattag 


tagagcaaaa 


1020 


tggaataaca 


ctttaaaaca 


gatagttaaa 


aaattaagag 


aacaatttgg 


gaataataaa 


1080 


acaatagtct 


ttaatcaatc 


ctcaggaggg 


gacccagaaa 


ttgtaatgca 


cagttttaat 


1140 


tgtggagggg 


aatttttcta 


ctgtaattca 


acacaactgt 


ttaatagtac 


ttggaatggt 


1200 


acttggactt 


ggaatactac 


tgaagggtca 


aatgacactg 


aaggagacac 


tatcacactc 


1260 


ccatgcagaa 


taaaacaaat 


tataaacatg 


tggcaggaag 


taggaaaagc 


aatgtatgcc 


1320 


cctcccatca 


gaggacaaat 


tagatgttca 


tcaaatatta 


cagggctgct 


attaacaaga 


1380 


gatggtggta 


ataataacac 


caacgagacc 


gagatcttca 


gacctggagg 


aggagatatg 


1440 


agggacaatt 


ggagaagtga 


attatataaa 


tataaagtag 


taaaaattga 


accattagga 


1500 


gtagcaccca 


ccaaggcaaa 


gagaagagtg 


gtgcagagag 


aaaaaagagc 


agtgggaata 


1560 


ataggagctg 


tgttccttgg 


gttcttggga 


gcagcaggaa 


gcactatggg 


cgcagcgtca 


1620 


atgacgctga 


cggtacaggc 


cagacaatta 


ttgtctggta 


tagtgcaaca 


gcagaacaat 

9 9 


1680 


ttgctgaggg 


ctattgaggc 


gcaacagcat 


ctgttgcaac 


tcacagtctg 


gggcatcaag 


1740 


cagctccagg 


caagagtcct 


ggctgtggaa 


agatacctaa 


gggatcaaca 


gctcctgggg 


1800 


atttggggtt 


gctctggaaa 


actcatttgc 


accactgctg 


tgccttggaa 


tgctagttgg 


1860 


agtaataaat 


ctctggatga 


gatttggaat 


aacatgacct 


ggatggagtg 


ggaaagagaa 


1920 


attgacaatt 


acacaagctt 


aatatacacc 


ttaattgaag 


aatcgcaaaa 


ccaacaagaa 


1980 
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aagaatgaac 


aagaattatt 


ggaattagat 


aaatgggcaa 


gtttgtggaa 


ttggtttgac 


2040 


ataacaaact 


ggctgtggta 


tataaaaata 


ttcataatga 


tagtaggagg 


cttggtaggt 


2100 


ttaagaatag 


tttttgctgt 


actttctata 


gtgaatagag 


ttaggcaggg 


atactcacca 


2160 


ttatcgtttc 


agacccgcct 


cccagccccg 


aggggacccg 


acaggcccga 


aggaatcgaa 


2220 


gaagaaggtg 


gagagagaga 


cagagacaga 


tccggtcgat 


tagtgaacgg 


attcttagca 


2280 


cttatctggg 


acgacctgcg 


gagcctgtgc 


ctcttcagct 


accaccgctt 


gagagactta 


2340 


ctcttgattg 


taacgaggat 


tgtggaactt 


ctgggacgca 


gggggtggga 


agccctcaaa 


2400 


tattggtgga 


atctcctaca 


gtattggagt 


caggaactaa 


agaatagtgc 


tgttagcttg 


2460 


ctcaatgcca 


cagccatagc 


agtagctgag 


gggacagata 


gggttataga 


agtagtacaa 


2520 


agagcttgta 


gagctattct 


ccacatacct 


acaagaataa 


gacagggctt 


ggaaagggct 


2580 


ttgctataa 












2589 



<210> 29 

<211> 621 

<212> DNA 

<213> Artificial sequence 
<220> 



<223> Most recent common ancestor reconstruction of dade B nef gene se 
quence 



<400> 29 
atgggtggca 


agtggtcaaa 


acgtagtgtg 


gttggatggc 


ctgctgtaag 


ggaaagaatg 


60 


agacgagctg 


agccagcagc 


agatggggtg 


ggagcagtat 


ctcgagacct 


ggaaaaacat 


120 


ggagcaatca 


caagtagcaa 


tacagcagct 


actaatgctg 


cttgtgcctg 


gctagaagca 


180 


caagaggagg 


aggaggtggg 


ttttccagtc 


agacctcagg 


tacctttaag 


accaatgact 


240 


tacaaggcag 


ctgtagatct 


tagccacttt 


ttaaaagaaa 


aggggggact 


ggaagggcta 


300 


gtttactccc 


aaaaaagaca 


agatatcctt 


gatctgtggg 


tctaccacac 


acaaggctac 


360 


ttccctgatt 


ggcagaacta 


cacaccaggg 


ccagggacca 


gatatccact 


gacctttgga 


420 


tggtgcttca 


agctagtacc 


agttgagcca 


gagaaggtag 


aagaggccac 


tgaaggagag 


480 


aacaacagct 


tgttacaccc 


tatgagcctg 


catggaatgg 


atgacccgga 


gagagaagtg 


540 


ttagtgtgga 


ggtttgacag 


ccgcctagca 


tttcatcaca 


tggcccgaga 


gaagcatccg 


600 


gagtactaca 


aggactgctg 


a 








621 



<210> 30 
<211> 621 
<212> DNA 
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<213> Artificial sequence 
<220> 



<223> Least squares center of tree reconstruction of dade B nef gene s 
equence 



<400> 30 
atgggtggca 


c^gtggtcaaa 


acgtagtgtg 


gttggatggc 


ctgctgtaag 


ggaaagaatg 


60 


agacgagctg 


agccagcagc 


agatggggtq 


ggagcagtat 


ctcgagacct 


ggaaaaacat 


120 


ggagcaatca 


caagtagcaa 


tacagcagct 


actaatgctg 


attgtgcctg 


gctagaagca 


180 


caagaggagg 


aggaggtggg 


ttttccagtc 


agacctcagg 


tacctttaag 


accaatgact 


240 


tacaaggcag 


ctttagatct 


tagccacttt 


ttaaaagaaa 


aggggggact 


ggaagggcta 


300 


atttactccc 


aaaaaagaca 


agatatcctt 


gatctgtggg 


tctaccacac 


acaaggctac 


360 


ttccctgatt 


ggcagaacta 


cacaccaggg 


ccagggatca gatatccact 


gacctttgga 


420 


tggtgcttca 


agctagtacc 


agttgagcca 


gagaaggtag 


aagaggccaa 


tgaaggagag 


480 


aacaacagct 


tgttacaccc 


tatgagcctg 


catgggatgg 


atgacccgga 


gaaagaagtg 


540 


ttagtgtgga 


agtttgacag 


ccgcctagca 


tttcatcaca 


tggcccgaga 


gctgcatccg 


600 


gagtactaca 


aggactgctg 


a 








621 



<210> 31 

<211> 621 

<212> DNA 

<213> Artificial sequence 
<220> 



<223> Minimum of means center of tree reconstruction of dade B nef gen 
e sequence 



<400> 31 
atgggtggca 


agtggtcaaa 


acgtagtgtg 


gttggatggc 


ctgctgtaag 


ggaaagaatg 


60 


agacgagctg 


agccagcagc 


agatggggtg 


ggagcagtat 


ctcgagacct 


ggaaaaacat 


120 


ggagcaatca 


caagtagcaa 


tacagcagct 


actaatgctg 


attgtgcctg 


gctagaagca 


180 


caagaggagg 


aggaggtggg 


ttttccagtc 


agacctcagg 


tacctttaag 


accaatgact 


240 


tacaaggcag 


ctttagatct 


tagccacttt 


ttaaaagaaa 


aggggggact 


ggaagggcta 


300 


atttactccc 


aaaaaagaca 


agatatcctt 


gatctgtggg 


tctaccacac 


acaaggctac 


360 


ttccctgatt 


ggcagaacta 


cacaccaggg 


ccagggatca 


gatatccact 


gacctttgga 


420 


tggtgcttca 


agctagtacc 


agttgagcca 


gagaaggtag 


aagaggccaa 


tgaaggagag 


480 


aacaactgct 


tgttacaccc 


tatgagccag 


catgggatgg 


atgacccgga 


gaaagaagtg 


540 
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ttagtgtgga agtttgacag ccgcctagca tttcatcaca tggcccgaga gctgcatccg 600 



gagtactaca aggactgctg a 621 

<210> 32 

<211> 3012 

<212> DNA 

<213> Artificial Sequence 



<220> 



<223> Most recent common ancestor reconstruction of dade B pol gene se 
quence 



<400> 32 
ttttttaggg 


aaaatctggc 


cttcccacaa 


gggaaggcca 


gggaactttc 


ttcagagcag 


60 


accagagcca 


acagccccac 


cagaagagag 


cttcaggttt 


ggggaagaga 


caacaactcc 


120 


ctctcagaag 


caggagcaga 


tagacaagga 


actgtatcct 


ttagcttccc 


tcaaatcact 


180 


ctttggcaac 


gacccctcgt 


cacaataaag 


ataggggggc 


aactaaagga 


agctctatta 


240 


gatacaggag 


cagatgatac 


agtattagaa 


gaaatgaatt 


tgccaggaaa 


atggaaacca 


300 


aaaatgatag 


ggggaattgg 


aggttttatc 


aaagtaagac 


agtatgatca 


aatacccata 


360 


gaaatctgtg 


gacataaagc 


tataggtaca 


gtattagtag 


gacctacacc 


tgtcaacata 


420 


attggaagaa 


atctgttgac 


tcagattggt 


tgcactttaa 


attttcccat 


tagtcctatt 


480 


gaaactgtac 


cagtaaaatt 


aaagccagga 


atggatggcc 


caaaagttaa 


acaatggcca 


540 


ttgacagaag 


aaaaaataaa 


agcattagta 


gaaatttgta 


cagaaatgga 


aaaggaagga 


600 


aaaatttcaa 


aaattgggcc 


tgaaaatcca 


tacaatactc 


cagtatttgc 


cataaagaaa 


660 


aaagacagta 


ctaaatggag 


aaaattagta 


gatttcagag 


aacttaataa 


gagaactcaa 


720 


gacttctggg 


aagttcaatt 


aggaatacca 


catcctgcag 


ggttaaaaaa 


gaaaaaatca 


780 


gtaacagtac 


tggatgtggg 


tgatgcatat 


ttttcagttc 


ccttagatga 


agacttcagg 


840 


aagtatactg 


catttaccat 


acctagtata 


aacaatgaga 


caccagggat 


tagatatcag 


900 


tacaatgtgc 


ttccacaggg 


atggaaagga 


tcaccagcaa 


tattccaaag 


tagcatgaca 


960 


aaaatcttag 


agccttttag 


aaaacaaaat 


ccagaaatag 


ttatctatca 


atacatggat 


1020 


gatttgtatg 


taggatctga 


cttagaaata 


gggcagcata 


gaacaaaaat 


agaggaactg 


1080 


agagaacatc 


tgttgaggtg 


gggatttacc 


acaccagaca 


aaaaacatca 


gaaagaacct 


1140 


ccatttcttt 


ggatgggtta 


tgaactccat 


cctgataaat 


ggacagtaca 


gcctatagtg 


1200 


ctgccagaaa 


aagacagctg 


gactgtcaat 


gacatacaga 


agttagtggg 


aaaattgaat 


1260 


tgggcaagtc 


agatttatgc 


agggattaaa 


gtaaagcaat 


tatgtaaact 


ccttagggga 


1320 


accaaagcac 


taacagaagt 


agtaccacta 


acagaagaag 


cagagctaga 


actggcagaa 


1380 


aacagggaga 


ttctaaaaga 


accagtacat 


ggagtgtatt 


atgacccatc 


aaaagactta 


1440 
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atagcagaaa 


tacagaagca 


ggggcaaggc 


caatggacat 


atcaaattta 


tcaagagcca 


1500 


tttaaaaatc 


tgaaaacagg 


aaagtatgca 


agaatgaggg 


gtgcccacac 


taatgatgta 


1560 


aaacaattaa 


cagaggcagt 


gcaaaaaata 


gccacagaaa 


gcatagtaat 


atggggaaag 


1620 


actcctaaat 


ttaaactacc 


catacaaaag 


gaaacatggg 


aagcatggtg 


gacagagtat 


1680 


tggcaagcca 


cctggattcc 


tgagtgggag 


tttgtcaata 


cccctccctt 


agtaaaatta 


1740 


tggtaccagt 


tagagaaaga 


acccatagta 


ggagcagaaa 


ctttctatgt 


agatggggca 


1800 


gctaatagag 


agactaaatt 


aggaaaagca 


ggatatgtta 


ctgacagagg 


aagacaaaaa 


1860 


gttgtctccc 


taactgacac 


aacaaatcag 


aagactgagt 


tacaagcaat 


tcatctagct 


1920 


ttgcaggatt 


cgggattaga 


agtaaacata 


gtaacagact 


cacaatatgc 


attaggaatc 


1980 


attcaagcac 


aaccagataa 


gagtgaatca 


gagttagtca 


gtcaaataat 


agagcagtta 


2040 


ataaaaaagg 


aaaaggtcta 


cctggcatgg 


gtaccagcac 


acaaaggaat 


tggaggaaat 


2100 


gaacaagtag 


ataaattagt 


cagtactgga 


atcaggaaag 


tactattttt 


ggatggaata 


2160 


gataaggccc 


aagaagaaca 


tgagaaatat 


cacagtaatt 


ggagagcaat 


ggctagtgat 


2220 


tttaacctgc 


cacctgtagt 


agcaaaagaa 


atagtagcca 


gctgtgataa 


atgtcagcta 


2280 


aaaggagaag 


ccatgcatgg 


acaagtagac 


tgtagtccag 


gaatatggca 


actagattgt 


2340 


acacatttag 


aaggaaaagt 


tatcctggta 


gcagttcatg 


tagccagtgg 


ctatatagaa 


2400 


gcagaagtta 


ttccagcaga 


aacagggcag 


gaaacagcat 


actttctctt 


aaaattagca 


2460 


ggaagatggc 


cagtaaaagt 


aatacataca 


gacaatggca 


gcaatttcac 


cagtactaca 


2520 


gttaaggccg 


cctgttggtg 


ggcagggatc 


aagcaggaat 


ttggcattcc 


ctacaatccc 


2580 


caaagtcaag 


gagtagtaga 


atctatgaat 


aaagaattaa 


agaaaattat 


aggacaggta 


2640 


agagatcagg 


ctgaacatct 


taagacagca 


gtacaaatgg 


cagtattcat 


ccacaatttt 


2700 


aaaagaaaag 


gggggattgg 


ggggtacagt 


gcaggggaaa 


gaatagtaga 


cataatagca 


2760 


acagacatac 


aaactaaaga 


actacaaaaa 


caaattacaa 


aaattcaaaa 


ttttcgggtt 


2820 


tattacaggg 


acagcagaga 


tccactttgg 


aaaggaccag 


caaagcttct 


ctggaaaggt 


2880 


gaaggggcag 


tagtaataca 


agataatagt 


gacataaaag 


tagtgccaag 


aagaaaagca 


2940 


aagatcatta 


gggattatgg 


aaaacagatg 


gcaggtgatg 


attgtgtggc 


aagtagacag 


3000 


gatgaggatt 


ag 










3012 



<210> 33 

<211> 3012 

<212> DNA 

<213> Artificial Sequence 
<220> 
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<223> Least squares center of tree reconstruction of clade B pol gene s 
equence 



<400> 33 
ttttttaggg 


aagatctggc 


cttcccacaa 


gggaaggcca 


gggaattttc 


ttcagagcag 


60 


accagagcca 


acagccccac 


cagaagagag 


cttcaggttt 


ggggaagaga 


caacaactcc 


120 


ctctcagaag 


caggagccga 


tagacaagga 


actgtatcct 


ttagcttccc 


tcagatcact 


180 


ctttggcaac 


gacccctcgt 


cacaataaag 


ataggggggc 


aactaaagga 


agctctatta 


240 


gatacaggag 


cagatgatac 


agtattagaa 


gaaatgaatt 


tgccaggaag 


atggaaacca 


300 


aaaatgatag 


ggggaattgg 


aggttttatc 


aaagtaagac 


agtatgatca 


gatacccata 


360 


gaaatctgtg 


gacataaagc 


tataggtaca 


gtattagtag 


gacctacacc 


tgtcaacata 


420 


attggaagaa 


atctgttgac 


tcagattggt 


tgcactttaa 


attttcccat 


tagtcctatt 


480 


gaaactgtac 


cagtaaaatt 


aaagccagga 


atggatggcc 


caaaagttaa 


acaatggcca 


540 


ttgacagaag 


aaaaaataaa 


agcattagta 


gaaatttgta 


cagaaatgga 


aaaggaaggg 


600 


aaaatttcaa 


aaattgggcc 


tgaaaatcca 


tacaatactc 


cagtatttgc 


cataaagaaa 


660 


aaagacagta 


ctaaatggag 


aaaattagta 


gatttcagag 


aacttaataa 


gagaactcaa 


720 


gacttctggg 


aagttcaatt 


aggaatacca 


catcccgcag 


ggttaaaaaa 


gaaaaaatca 


780 


gtaacagtac 


tggatgtggg 


tgatgcatat 


ttttcagttc 


ccttagatga 


agacttcagg 


840 


aagtatactg 


catttaccat 


acctagtata 


aacaatgaga 


caccagggat 


tagatatcag 


900 


tacaatgtgc 


ttccacaggg 


atggaaagga 


tcaccagcaa 


tattccaaag 


tagcatgaca 


960 


aaaatcttag 


agccttttag 


aaaacaaaat 


ccagacatag 


ttatctatca 


atacatggat 


1020 


gatttgtatg 


taggatctga 


cttagaaata 


gggcagcata 


gaacaaaaat 


agaggaactg 


1080 


agacaacatc 


tgttgaggtg 


gggatttacc 


acaccagaca 


aaaaacatca 


gaaagaacct 


1140 


ccattccttt 


ggatgggtta 


tgaactccat 


cctgataaat 


ggacagtaca 


gcctatagtg 


1200 


ctgccagaaa 


aagacagctg 


gactgtcaat 


gacatacaga 


agttagtggg 


aaaattgaat 


1260 


tgggcaagtc 


agatttatgc 


agggattaaa 


gtaaagcaat 


tatgtaaact 


ccttagggga 


1320 


accaaagcac 


taacagaagt 


aataccacta 


acagaagaag 


cagagctaga 


actggcagaa 


1380 


aacagggaga 


ttctaaaaga 


accagtacat 


ggagtgtatt 


atgacccatc 


aaaagactta 


1440 


atagcagaaa 


tacagaagca 


ggggcaaggc 


caatggacat 


atcaaattta 


tcaagagcca 


1500 


tttaaaaatc 


tgaaaacagg 


aaagtatgca 


agaatgaggg 


gtgcccacac 


taatgatgta 


1560 


aaacaattaa 


cagaggcagt 


gcaaaaaata 


gccacagaaa 


gcatagtaat 


atggggaaag 


1620 


actcctaaat 


ttaaactacc 


catacaaaaa 


gaaacatggg 


aagcatggtg 


gacagagtat 


1680 


tggcaagcca 


cctggattcc 


tgagtgggag 


tttgtcaata 


cccctccctt 


agtgaaatta 


1740 


tggtaccagt 


tagagaaaga 


acccatagta 


ggagcagaaa 


ctttctatgt 


agatggggca 


1800 


gctaataggg 


agactaaatt 


aggaaaagca 


ggatatgtta 


ctgacagagg 


aagacaaaaa 


1860 


gttgtctccc 


taactgacac 


aacaaatcag 


aagactgagt 


tacaagcaat 


tcatctagct 


1920 
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ttgcaggatt 


cgggattaga 


agtaaacata 


gtaacagact 


cacaatatgc 


attaggaatc 


1980 


attcaagcac 


aaccagataa 


gagtgaatca 


gagttagtca 


gtcaaataat 


agagcagtta 


2040 


ataaaaaagg 


aaaaggtcta 


cctggcatgg 


gtaccagcac 


acaaaggaat 


tggaggaaat 


2100 


gaacaagtag 


ataaattagt 


cagtgctgga 


atcaggaaag 


tactattttt 


ggatggaata 


2160 


gataaggccc 


aagaagaaca 


tgagaaatat 


cacagtaatt 


ggagagcaat 


ggctagtgat 


2220 


tttaacctgc 


cacctgtagt 


agcaaaagaa 


atagtagcca 


gctgtgataa 


atgtcagcta 


2280 


aaaggagaag 


ccatgcatgg 


acaagtagac 


tgtagtccag 


gaatatggca 


actagattgt 


2340 


acacatttag 


aaggaaaagt 


tatcctggta 


gcagttcatg 


tagccagtgg 


atatatagaa 


2400 


gcagaagtta 


ttccagcaga 


gacagggcag 


gaaacagcat 


actttctctt 


aaaattagca 


2460 


ggaagatggc 


cagtaaaaac 


aatacataca 


gacaatggca 


gcaatttcac 


cagtactacg 


2520 


gttaaggccg 


cctgttggtg 


ggcagggatc 


aagcaggaat 


ttggcattcc 


ctacaatccc 


2580 


caaagtcaag 


gagtagtaga 


atctatgaat 


aaagaattaa 


agaaaattat 


aggacaggta 


2640 


agagatcagg 


ctgaacatct 


taagacagca 


gtacaaatgg 


cagtattcat 


ccacaatttt 


2700 


aaaagaaaag 


gggggattgg 


ggggtacagt 


gcaggggaaa 


gaatagtaga 


cataatagca 


2760 


acagacatac 


aaactaaaga 


attacaaaaa 


caaattacaa 


aaattcaaaa 


ttttcgggtt 


2820 


tattacaggg 


acagcagaga 


tccactttgg 


aaaggaccag 


caaagcttct 


ctggaaaggt 


2880 


gaaggggcag 


tagtaataca 


agataatagt 


gacataaaag 


tagtgccaag 


aagaaaagca 


2940 


aagatcatta 


gggattatgg 


aaaacagatg 


gcaggtgatg 


attgtgtggc 


aagtagacag 


3000 


gatgaggatt 


ag 










3012 



<210> 34 

<211> 3012 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Minimum of means center of tree reconstruction of dade B pol gen 
e sequence 

<400> 34 

ttttttaggg aagatctggc cttcccacaa gggaaggcca gggaattttc ttcagagcag 60 

accagagcca acagccccac cagaagagag cttcaggttt ggggaagaga caacaactcc 120 

ctctcagaag caggagccga tagacaagga actgtatcct ttagcttccc tcagatcact 180 

ctttggcaac gacccctcgt cacaataaag ataggggggc aactaaagga agctctatta 240 

gatacaggag cagatgatac agtattagaa gaaatgaatt tgccaggaag atggaaacca 300 

aaaatgatag ggggaattgg aggttttatc aaagtaagac agtatgatca gatactcata 360 

gaaatctgtg gacataaagc tataggtaca gtattagtag gacctacacc tgtcaacata 420 
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attggaagaa 


atctgttgac 


tcagattggt 


tgcactttaa 


attttcccat 


tagtcctatt 


480 


gaaactgtac 


cagtaaaatt 


aaagccagga 


atggatggcc 


caaaagttaa 


acaatggcca 


540 


ttgacagaag 


aaaaaataaa 


agcattagta 


gaaatttgta 


cagaaatgga 


aaaggaaggg 


600 


aaaatttcaa 


aaattgggcc 


tgaaaatcca 


tacaatactc 


cagtatttgc 


cataaagaaa 


660 


aaagacagta 


ctaaatggag 


aaaattagta 


gatttcagag 


aacttaataa 


gagaactcaa 


720 


gacttctggg 


aagttcaatt 


aggaatacca 


catcccgcag 


ggttaaaaaa 


gaaaaaatca 


780 


gtaacagtac 


tggatgtggg 


tgatgcatat 


ttttcagttc 


ccttagatga 


agacttcagg 


840 


aagtatactg 


catttaccat 


acctagtata 


aacaatgaga 


caccagggat 


tagatatcag 


900 


tacaatgtgc 


ttccacaggg 


atggaaagga 


tcaccagcaa 


tattccaaag 


tagcatgaca 


960 


aaaatcttag 


agccttttag 


aaaacaaaat 


ccagacatag 


ttatctatca 


atacatggat 


1020 


gatttgtatg 


taggatctga 


cttagaaata 


gggcagcata 


gaacaaaaat 


agaggaactg 


1080 


agacaacatc 


tgttgaggtg 


gggatttacc 


acaccagaca 


aaaaacatca 


gaaagaacct 


1140 


ccattccttt 


ggatgggtta 


tgaactccat 


cctgataaat 


ggacagtaca 


gcctatagtg 


1200 


ctgccagaaa 


aagacagctg 


gactgtcaat 


gacatacaga 


agttagtggg 


aaaattgaat 


1260 


tgggcaagtc 


agatttaccc 


agggattaaa 


gtaaagcaat 


tatgtaaact 


ccttagggga 


1320 


accaaagcac 


taacagaagt 


aataccacta 


acagaagaag 


cagagctaga 


actggcagaa 


1380 


aacagggaaa 


ttctaaaaga 


accagtacat 


ggagtgtatt 


atgacccatc 


aaaagactta 


1440 


atagcagaaa 


tacagaagca 


ggggcaaggc 


caatggacat 


atcaaattta 


tcaagagcca 


1500 


tttaaaaatc 


tgaaaacagg 


aaaatatgca 


agaatgaggg 


gtgcccacac 


taatgatgta 


1560 


aaacaattaa 


cagaggcagt 


gcaaaaaata 


gccacagaaa 


gcatagtaat 


atggggaaag 


1620 


actcctaaat 


ttaaactacc 


catacaaaaa 


gaaacatggg 


aaacatggtg 


gacagagtat 


1680 


tggcaagcca 


cctggattcc 


tgagtgggag 


tttgtcaata 


cccctccctt 


agtgaaatta 


1740 


tggtaccagt 


tagagaaaga 


acccatagta 


ggagcagaaa 


ctttctatgt 


agatggggca 


1800 


gctaacaggg 


agactaaatt 


aggaaaagca 


ggatatgtta 


ctaacagagg 


aagacaaaaa 


1860 


gttgtctccc 


taactgacac 


aacaaatcag 


aagactgagt 


tacaagcaat 


tcatctagct 


1920 


ttgcaggatt 


cgggattaga 


agtaaacata 


gtaacagact 


cacaatatgc 


attaggaatc 


1980 


attcaagcac 


aaccagataa 


aagtgaatca 


gagttagtca 


gtcaaataat 


agagcagtta 


2040 


ataaaaaagg 


aaaaggtcta 


cctggcatgg 


gtaccagcac 


acaaaggaat 


tggaggaaat 


2100 


gaacaagtag 


ataaattagt 


cagtgctgga 


atcaggaaag 


tactattttt 


agatggaata 


2160 


gataaggccc 


aagaagaaca 


tgagaaatat 


cacagtaatt 


ggagagcaat 


ggctagtgat 


2220 


tttaacctgc 


cacctgtagt 


agcaaaagaa 


atagtagcca 


gctgtgataa 


atgtcagcta 


2280 


aaaggagaag 


ccatgcatgg 


acaagtagac 


tgtagtccag 


gaatatggca 


actagattgt 


2340 


acacatttag 


aaggaaaagt 


tatcctggta 


gcagttcatg 


tagccagtgg 


atatatagaa 


2400 


gcagaagtta 


ttccagcaga 


gacagggcag 


gaaacagcat 


actttctctt 


aaaattagca 


2460 
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ggaagatggc 


cagtaaaaac 


aatacataca 


gacaatggca 


gcaatttcac 


cagtactacg 


2520 


gttaaggccg 


cctgttggtg 


ggcggggatc 


aagcaggaat 


ttggcattcc 


ctacaatccc 


2580 


caaagtcaag 


gagtagtaga 


atctatgaat 


aaagaattaa agaaaattat 


aggacaggta 


2640 


agagatcagg 


ctgaacatct 


taagacagca 


gtacaaatgg 


cagtattcat 


ccacaatttt 


2700 


aaaagaaaag 


gggggattgg 


ggggtacagt 


gcaggggaaa 


gaatagtaga 


cataatagca 


2760 


acagacatac 


aaactaaaga 


attacaaaaa 


caaattacaa 


aaattcaaaa 


ttttcgggtt 


2820 


tattacaggg 


acagcagaga 


tccactttgg 


aaaggaccag 


caaagcttct 


ctggaaaggt 


2880 


gaaggggcag 


tagtaataca 


agataatagt 


gacataaaag 


tagtgccaag 


aagaaaagca 


2940 


aagatcatta 


gggattatgg 


aaaacagatg 


gcaggtgatg 


attgtgtggc 


aagtagacag 


3000 


gatgaggatt 


ag 










3012 



<210> 35 

<211> 360 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Most recent common ancestor reconstruction of dade B rev gene se 
quence 

<400> 35 

atggcaggaa gaagcggaga cagcgacgaa gagctcctca agacagtcag actcatcaag 60 

tttctctatc aaagcaaccc gcctcccagc cccgagggga cccgacaggc ccgaaggaat 120 

agaagaagaa ggtggagaga gagacagaga cagatccgtt cgattagtga acggattctt 180 

agcacttatc tgggacgatc tgcggagcct gtgcctcttc agctaccacc gcttgagaga 240 

cttactcttg attgtagcga ggattgtgga acttctggga cgcagggggt gggaagtcct 300 

caaatattgg tggaatctcc tgcagtattg gagtcaggaa ctaaagaata gtgctgttag 360 

<210> 36 

<211> 360 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Least squares center of tree reconstruction of dade B rev gene s 
equence 

<400> 36 

atggcaggaa gaagcggaga cagcgacgaa gagctcctca agacagtcag actcatcaag 60 
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tttctctatc aaagcaaccc gcctcccagc cccgagggga cccgacaggc ccgaaggaat 120 

cgaagaagaa ggtggagaga gagacagaga cagatccggt cgattagtga atggattctt 180 

agcacttatc tgggtcgacc tgcggagcct gtgcctcttc agctaccacc gcttgagaga 240 

cttactcttg attgtaacga ggattgtgga acttctggga cgcagggggt gggaagtcct 300 

caaatattgg tggaatctcc tacagtattg gagtcaggaa ctaaagaata gtgctgttag 360 

<210> 37 

<211> 360 

<212> DNA 

<213> Artificial sequence 



<220> 

<223> Minimum of means center of tree reconstruction of dade B rev gen 
e sequence 

<400> 37 

atggcaggaa gaagcggaga cagcgacgaa gagctcctca agacagtcag actcatcaag 60 

tttctctatc aaagcaaccc gcctcccagc cccgagggga cccgacaggc ccgaaggaat 120 

cgaagaagaa ggtggagaga gagacagaga cagatccggt cgattagtga atggattctt 180 

agcacttatc tgggacgacc tgcggagcct gtgcctcttc agctaccacc gcttgagaga 240 

cttactcttg attgtagcga ggattgtgga acttctggga cgcagggggt gggaagtcct 300 

caaatattgg tggaatctcc tgcagtattg gagtcaggaa ctaaagaata gtgctgttag 360 

<210> 38 

<211> 321 

<212> DNA 

<213> Artificial sequence 



<220> 

<223> Most recent common ancestor reconstruction of clade B tat gene se 
quence 

<400> 38 

atggagccag tagatcctag actagagccc tggaagcatc caggaagtca gcctaagact 60 

gcttgtacca attgctattg taaaaagtgt tgctatcatt gccaagtttg cttcataaca 120 

aaaggcttag gcatctccta tggcaggaag aagcggagac agcgacgaag acctcctcaa 180 

ggcagtcaga ctcatcaagt ttctctatca aagcaacccg cctcccagcc ccgaggggac 240 

ccgacaggcc cgaaggaatc gaagaagaag gtggagagag agacagagac agatccggtc 300 

gattagtgaa tggattctta g 321 
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<210> 39 

<211> 321 

<212> DNA 

<213> Artificial sequence 

<220> 

<223> Least squares center of tree reconstruction of clade B tat gene s 
equence 

<400> 39 

atggagccag tagatcctag actagagccc tggaagcatc caggaagtca gcctaagact 60 

gcttgtacca attgctattg taaaaagtgt tgctttcatt gccaagtttg tttcataaca 120 

aaaggcttag gcatctccta tggcaggaag aagcggagac agcgacgaag agctcctcaa 180 

gacagtcaga ctcatcaagt ttctctatca aagcaacccg cctcccagcc ccgaggggac 240 

ccgacaggcc cgaaggaatc gaagaagaag gtggagagag agacagagac agatccggtc 300 

gattagtgaa tggattctta g 321 

<210> 40 

<211> 321 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Minimum of means center of tree reconstruction of clade B tat gen 
e sequence 



<400> 40 
atggagccag 


tagatcctag 


actagagccc 


tggaagcatc 


caggaagtca 


gcctaagact 


60 


gcttgtacca 


attgctattg 


taaaaagtgt 


tgctttcatt 


gccaagtttg 


tttcataaca 


120 


aaaggcttag 


gcatctccta 


tggcaggaag 


aagcggagac 


agcgacgaag 


agctcctcaa 


180 


gacagtcaga 


ctcatcaagt 


ttctctatca 


aagcaacccg 


cctcccagcc 


ccgaggggac 


240 


ccgacaggcc 


cgaaggaatc 


gaagaagaag 


gtggagagag 


agacagagac 


agatccggtc 


300 


gattagtgga 


tggattctta 


t 








321 



<210> 41 
<211> 579 
<212> DNA 

<213> Artificial sequence 
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<220> 



<223> Most recent common ancestor reconstruction of clade B vif gene se 
quence 



<400> 41 
atggaaaaca 


gatggcaggt 


gatgattgtg 


tggcaagtag 


acaggatgag 


gattagaaca 


60 


tggaaaagtt 


tagtaaaaca 


ccatatgtat 


atttcaaaga 


aagctaaggg 


atggttttat 


120 


agacatcact 


atgaaagcac 


tcatccaaga 


ataagttcag 


aagtacacat 


cccactagga 


180 


gatgctagat 


tggtaataaa 


aacatattgg 


ggtctgcata 


caggagaaag 


agaatggcat 


240 


ttgggtcagg 


gagtctccat 


agaatggagg 


aaaaggagat 


atagcacaca 


agtagaccct 


300 


ggcctagcag 


accaactaat 


tcatctgtat 


tattttgatt 


gtttttcaga 


atctgctata 


360 


agaaatgcca 


tattaggaca 


tatagttagt 


cctaggtgtg 


aatatcaagc 


aggacataac 


420 


aaggtaggat 


ctctacagta 


cttggcacta 


acagcattaa 


taacaccaaa 


aaagataaag 


480 


ccacctttgc 


ctagtgttag 


gaaactgaca 


gaggatagat 


ggaacaagcc 


ccagaagacc 


540 


aagggccaca 


gagggagcca 


tacaatgaat 


ggacactag 






579 



<210> 42 

<211> 579 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Least squares and minimum of means center of tree reconstructions 
of clade B vif gene sequence 



<400> 42 
atggaaaaca 


gatggcaggt 


gatgattgtg 


tggcaagtag 


acaggatgag 


gattagaaca 


60 


tggaaaagtt 


tagtaaaaca 


ccatatgtat 


atttcaagga 


aagctaaggg 


atggttttat 


120 


agacatcact 


atgaaagcac 


tcatccaaga 


ataagttcag 


aagtacacat 


cccactaggg 


180 


gatgctagat 


tggtaataac 


aacatattgg 


ggtctgcata 


caggagaaag 


agactggcat 


240 


ttgggtcagg 


gagtctccat 


agaatggagg 


aaaaagagat 


atagcacaca 


agtagaccct 


300 


gacctagcag 


accaactaat 


tcatctgtat 


tactttgatt 


gtttttcaga 


atctgctata 


360 


agaaatgcca 


tattaggaca 


tatagttagt 


cctaggtgtg 


aatatcaagc 


aggacataac 


420 


aaggtaggat 


ctctacagta 


cttggcacta 


gcagcattaa 


taacaccaaa 


aaagataaag 


480 


ccacctttgc 


ctagtgttac 


gaaactgaca 


gaggatagat 


ggaacaagcc 


ccagaagacc 


540 


aagggccaca 


gagggagcca 


tacaatgaat 


ggacactag 






579 



<210> 43 
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<211> 291 
<212> DNA 

<213> Artificial sequence 

<220> 

<223> Most recent common ancestor reconstruction of clade b vpr gene se 
quence 



<400> 43 



atggaacaag 


ccccagaaga 


ccaagggcca 


cagagggagc 


catacaatga 


atggacacta 


60 


gagcttttag 


aggagcttaa 


gagtgaagct 


gttagacatt 


ttcctaggct 


atggctccat 


120 


agcttaggac 


aacatatcta 


tgaaacttat 


ggggatacct 


gggcaggagt 


ggaagctata 


180 


ataagaattc 


tgcaacaact 


gctgtttatt 


catttcagaa 


ttgggtgtca 


acatagcaga 


240 


ataggcatta 


ctcgacagag 


aagagcaaga 


aatggagcca 


gtagatccta 


g 


291 



<210> 44 

<211> 291 

<212> DNA 

<213> Artificial sequence 

<220> 

<223> Least squares and minimum of means center of tree reconstruction 
of clade B vpr gene sequence 

<400> 44 

atggaacaag ccccagaaga ccaagggcca cagagggagc catacaatga atggacacta 60 

gagcttttag aggagcttaa gagtgaagct gttagacatt ttcctaggat atggctccat 120 

agcttaggac aacatatcta tgaaacttat ggggatactt gggcaggagt ggaagccata 180 

ataagaattc tgcaacaact gctgtttatt catttcagaa ttgggtgtcg acatagcaga 240 

ataggcatta ctcgacagag gagagcaaga aatggagcca gtagatccta g 291 

<210> 45 

<211> 246 

<212> DNA 

<213> Artificial sequence 

<220> 

<223> Most recent common ancestor reconstruction of clade B vpu gene se 
quence 

<400> 45 
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atgcaacctt tagaaatatt agcaatagta gcattagtag tagcagcaat actagcaata 60 

gttgtgtgga ccatagtatt catagaatat aggaaaatat taaggcaaag aaaaatagac 120 

aggttaattg atagaataag agaaagagca gaagacagtg gcaatgagag tgaaggggat 180 

caggaagaat tatcagcact tgtggaaatg gggcaccatg ctccttggga tgttgatgat 240 

ctgtag 246 

<210> 46 

<211> 246 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Least squares and minimum of means center of tree reconstructions 
of clade B vpu gene sequence 

<400> 46 

atgcaacctt tacaaatatt agcaatagta gcattagtag tagcagcaat aatagcaata 60 

gttgtgtgga ccatagtatt catagaatat aggaaaatat taagacaaag aaaaatagac 120 

aggttaattg atagaataag agaaagagca gaagacagtg gcaatgagag tgaaggggat 180 

caggaagaat tatcagcact tgtggagatg gggcaccatg ctccttggga tgttgatgat 240 

ctgtag 246 

<210> 47 

<211> 500 

<212> PRT 

<213> Artificial sequence 



<220> 

<223> Most recent common ancestor reconstruction of clade B gag protein 
sequence 

<400> 47 

Met Gly Ala Arg Ala Ser Val Leu Ser Gly Gly Glu Leu Asp Lys Trp 
15 10 15 

Glu Lys lie Arg Leu Arg pro Gly Gly Lys Lys Lys Tyr Lys Leu Lys 
20 25 30 

His lie val Trp Ala Ser Arg Glu Leu Glu Arg Phe Ala Val Asn Pro 
35 40 45 



Gly Leu Leu Glu Thr ser Glu Gly Cys Arg Gin lie Leu Gly Gin Leu 
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50 55 60 

Gin Pro Ser Leu Gin Thr Gly Ser Glu Glu Leu Arg Sen Leu Tyr Asn 
65 70 75 80 

Thr Val Ala Val Leu Tyr Cys Val His Gin Lys lie Glu val Lys Asp 
85 90 95 

Thr Lys Glu Ala Leu Asp Lys lie Glu Glu Glu Gin Asn Lys ser Lys 
100 105 110 

Lys Lys Ala Gin Gin Ala Ala Ala Asp Thr Gly Asn ser ser Gin Val 
115 120 125 

Ser Gin Asn Tyr Pro He Val Gin Asn Leu Gin Gly Gin Met Val His 
130 135 140 

Gin Ala Leu Ser Pro Arg Thr Leu Asn Ala Trp val Lys Val lie Glu 
145 150 155 160 

Glu Lys Ala Phe Ser Pro Glu val lie Pro Met Phe Ser Ala Leu Ser 
165 170 175 

Glu Gly Ala Thr Pro Gin Asp Leu Asn Thr Met Leu Asn Thr val Gly 
180 185 190 

Gly His Gin Ala Ala Met Gin Met Leu Lys Glu Thr lie Asn Glu Glu 
195 200 205 

Ala Ala Glu Trp Asp Arg Leu His Pro val His Ala Gly Pro lie Ala 
210 215 220 

Pro Gly Gin Met Arg Glu Pro Arg Gly Ser Asp lie Ala Gly Thr Thr 
225 230 235 240 

Ser Thr Leu Gin Glu Gin lie Ala Trp Met Thr Asn Asn Pro Pro lie 
245 250 255 

Pro Val Gly Glu lie Tyr Lys Arg Trp lie lie Leu Gly Leu Asn Lys 
260 265 270 

He Val Arg Met Tyr Ser Pro val Ser He Leu Asp He Arg Gin Gly 
275 280 285 

Pro Lys Glu Pro Phe Arg Asp Tyr val Asp Arg Phe Tyr Lys Thr Leu 

290 295 300 

Arg Ala Glu Gin Ala Ser Gin Glu Val Lys Asn Trp Met Thr Glu Thr 
305 310 315 320 

Leu Leu val Gin Asn Ala Asn Pro Asp Cys Lys Thr He Leu Lys Ala 
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325 330 335 

Leu Gly Pro Gly Ala Thr Leu Glu Glu Met Met Thr Ala Cys Gin Gly 
340 345 350 

val Gly Gly Pro Gly His Lys Ala Arg Val Leu Ala Glu Ala Met Ser 

355 360 365 

Gin val Thr Asn ser Ala Thr lie Met Met Gin Arg Gly Asn Phe Arg 
370 375 380 

Asn Pro Arg Lys Thr val Lys Cys Phe Asn cys Gly Lys Glu Gly His 
385 390 395 400 

lie Ala Arg Asn Cys Arg Ala Pro Arg Lys Lys Gly Cys Trp Lys Cys 
405 410 415 

Gly Lys Glu Gly His Gin Met Lys Asp Cys Thr Glu Arg Gin Ala Asn 
420 425 430 

Phe Leu Gly Lys lie Trp Pro Ser His Lys Gly Arg Pro Gly Asn Phe 
435 440 445 

Leu Gin ser Arg Pro Glu Pro Thr Ala Pro Pro Glu Glu ser Phe Arg 
450 455 460 

Phe Gly Glu Glu Thr Thr Thr Pro Ser Gin Lys Gin Glu Gin Lys Asp 
465 470 475 480 

Lys Glu Leu Tyr Pro Leu Ala ser Leu Lys ser Leu Phe Gly Asn Asp 
485 490 495 

Pro Ser Ser Gin 
500 

<210> 48 
<211> 500 
<212> PRT 

<213> Artificial sequence 



<220> 

<223> Least squares center of tree reconstruction of clade B gag protei 
n sequence 

<400> 48 

Met Gly Ala Arg Ala ser val Leu Ser Gly Gly Glu Leu Asp Arg Trp 
15 10 15 
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Glu Lys lie Arg Leu Arg Pro Gly Gly Lys Lys Lys Tyr Arg Leu Lys 
20 25 30 

His lie val Trp Ala Ser Arg Glu Leu Glu Arg Phe Ala val Asn Pro 
35 40 45 

Gly Leu Leu Glu Thr ser Glu Gly Cys Arg Gin lie Leu Gly Gin Leu 
50 55 60 

Gin Pro ser Leu Gin Thr Gly Ser Glu Glu Leu Arg Ser Leu Tyr Asn 
65 70 75 80 

Thr val Ala Thr Leu Tyr Cys val His Gin Arg lie Glu val Lys Asp 
85 90 95 

Thr Lys Glu Ala Leu Glu Lys lie Glu Glu Glu Gin Asn Lys ser Lys 
100 105 110 

Lys Lys Ala Gin Gin Ala Ala Ala Asp Thr Gly Asn ser ser Gin val 
115 120 125 

ser Gin Asn Tyr Pro lie val Gin Asn Leu Gin Gly Gin Met Val His 
130 135 140 

Gin Ala lie Ser Pro Arg Thr Leu Asn Ala Trp val Lys val val Glu 
145 150 155 160 

Glu Lys Ala Phe ser Pro Glu val lie Pro Met Phe Ser Ala Leu Ser 
165 170 175 

Glu Gly Ala Thr Pro Gin Asp Leu Asn Thr Met Leu Asn Thr val Gly 
180 185 190 

Gly His Gin Ala Ala Met Gin Met Leu Lys Glu Thr He Asn Glu Glu 
195 200 205 

Ala Ala Glu Trp Asp Arg Leu His Pro val His Ala Gly Pro lie Ala 
210 215 220 

Pro Gly Gin Met Arg Glu Pro Arg Gly ser Asp lie Ala Gly Thr Thr 
225 230 235 240 

Ser Thr Leu Gin Glu Gin lie Gly Trp Met Thr Asn Asn Pro Pro lie 
245 250 255 

Pro val Gly Glu lie Tyr Lys Arg Trp lie lie Leu Gly Leu Asn Lys 
260 265 270 

lie val Arg Met Tyr Ser Pro Thr Ser lie Leu Asp lie Arg Gin Gly 
275 280 285 
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Pro Lys Glu Pro Phe Arg Asp Tyr val Asp Arg Phe Tyr Lys Thr Leu 
290 295 300 

Arg Ala Glu Gin Ala Ser Gin Glu val Lys Asn Trp Met Thr Glu Thr 
305 310 315 320 

Leu Leu Val Gin Asn Ala Asn Pro Asp Cys Lys Thr lie Leu Lys Ala 
325 330 335 

Leu Gly Pro Ala Ala Thr Leu Glu Glu Met Met Thr Ala Cys Gin Gly 
340 345 350 

val Gly Gly Pro Gly His Lys Ala Arg val Leu Ala Glu Ala Met Ser 
355 360 365 

Gin val Thr Asn ser Ala Thr lie Met Met Gin Arg Gly Asn phe Arg 
370 375 380 

Asn Gin Arg Lys Thr Val Lys Cys Phe Asn Cys Gly Lys Glu Gly His 
385 390 395 400 

lie Ala Lys Asn Cys Arg Ala Pro Arg Lys Lys Gly Cys Trp Lys Cys 
405 410 415 

Gly Lys Glu Gly His Gin Met Lys Asp Cys Thr Glu Arg Gin Ala Asn 
420 425 430 

Phe Leu Gly Lys lie Trp Pro Ser His Lys Gly Arg Pro Gly Asn Phe 
435 440 445 

Leu Gin Ser Arg Pro Glu Pro Thr Ala Pro Pro Glu Glu Ser Phe Arg 
450 455 460 

Phe Gly Glu Glu Thr Thr Thr Pro Ser Gin Lys Gin Glu Pro lie Asp 
465 470 475 480 

Lys Glu Leu Tyr Pro Leu Ala ser Leu Arg Ser Leu Phe Gly Asn Asp 
485 490 495 

Pro ser ser Gin 
500 

<210> 49 
<211> 500 
<212> PRT 

<213> Artificial sequence 



<220> 
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<223> Minimum of means center of tree reconstruction of clade B gag pro 
tein sequence 

<400> 49 

Met Gly Ala Gly Ala Ser val Leu Ser Gly Gly Lys Leu Asp Arg Trp 
15 10 15 

Glu Lys lie Arg Leu Arg Pro Gly Gly Lys Lys Lys Tyr Lys Leu Lys 
20 25 30 

His lie val Trp Ala ser Arg Glu Leu Glu Arg Phe Ala Val Asn Pro 
35 40 45 

Gly Leu Leu Glu Thr Ser Glu Gly Cys Arg Arg lie Leu Glu Gin Leu 
50 55 60 

His Pro ser Leu Gin Thr Gly Ser Glu Glu Leu Lys ser Leu Tyr Asn 
65 70 75 80 

Thr val Ala Thr Leu Tyr cys val His Gin Asn lie Glu val Arg Asp 
85 90 95 

Thr Lys Asp Ala Leu Glu Lys lie Glu Glu Glu Gin Asn Lys lie Lys 
100 105 110 

Lys Arg Ala Gin Gin Ala Ala Ala Asp Thr Gly Asn Ser Asn Pro val 
115 120 125 

ser Gin Asn Tyr Pro lie val Gin Asn Met Gin Gly Gin Met val His 
130 135 140 

Gin Ala lie Ser Pro Arg Thr Leu Asn Ala Trp val Lys val Val Glu 
145 150 155 160 

Glu Lys Ala Phe Ser Pro Glu val lie Pro Met Phe Ser Ala Leu Ser 
165 170 175 

Glu Gly Ala Thr Pro Gin Asp Leu Asn Thr Met Leu Asn Thr val Gly 
180 185 190 

Gly His Gin Ala Ala Met Gin Met Leu Lys Glu Thr lie Asn Glu Glu 
195 200 205 

Ala Ala Glu Trp Asp Arg Leu His Pro Val His Ala Gly Pro lie Ala 
210 215 220 

Pro Gly Gin Met Arg Glu Pro Arg Gly Ser Asp lie Ala Gly Thr Thr 
225 230 235 240 

Ser Thr Leu Gin Glu Gin lie Gly Trp Met Thr His Asn Pro Pro lie 
245 250 255 
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Pro val Gly Glu lie Tyr Lys Arg Trp lie lie Met Gly Leu Asn Lys 
260 265 270 

lie Val Arg Met Tyr ser Pro Thr Ser lie Leu Asp lie Arg Gin Gly 
275 280 285 

Pro Lys Glu Pro Phe Arg Asp Tyr val Asp Arg Phe Tyr Lys Thr Leu 
290 295 300 

Arg Ala Glu Gin Ala Ser Gin Glu val Lys Asn Trp Met Thr Glu Thr 
305 310 315 320 

Leu Leu Val Gin Asn Ala Asn Pro Asp Cys Lys Thr lie Leu Lys Ala 
325 330 335 

Leu Gly Pro Ala Ala Thr Leu Glu Glu Met Met Thr Ala Cys Gin Gly 
340 345 350 

val Gly Gly Pro ser His Lys Ala Arg val Leu Ala Glu Ala Met ser 
355 360 365 

Gin Ala Thr Asn Ser Ala Thr lie Met Met Gin Arg Gly Asn Phe Lys 
370 375 380 

Gly Gin Arg Lys Thr val Lys Cys Phe Asn Cys Gly Lys Glu Gly His 
385 390 395 400 

lie Ala Arg Asn Cys Arg Ala Pro Arg Lys Lys Gly Cys Trp Lys Cys 
405 410 415 

Gly Lys Glu Gly His Gin Met Lys Asp Cys Thr Glu Arg Gin Ala Asn 
420 425 430 

Phe Leu Gly Lys lie Trp Pro ser His Lys Gly Arg Pro Gly Asn Phe 
435 440 445 

Leu Gin Ser Arg Pro Glu Pro Thr Ala Pro Pro Glu Glu Ser Phe Arg 
450 455 460 

Phe Gly Glu Glu Thr Thr Thr Pro Pro Gin Lys Gin Glu Pro Arg Asp 
465 470 475 480 

Lys Glu Gin Tyr pro Leu Thr ser Leu Arg ser Leu Phe Gly Asn Asp 
485 490 495 

Pro Ser ser Gin 
500 

<210> 50 
<211> 862 
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<212> PRT 

<213> Artificial sequence 
<220> 

<223> Most recent common ancestor reconstruction of clade B gp 160 prot 
ein sequence 

<400> 50 

Met Arg val Lys Gly lie Arg Lys Asn Cys Gin His Leu Trp Lys Trp 
15 10 15 

Gly Thr Met Leu Leu Gly Met Leu Met lie Cys Ser Ala Ala Glu Asn 
20 25 30 

Leu Trp val Thr val Tyr Tyr Gly val Pro val Trp Lys Glu Ala Thr 
35 40 45 

Thr Thr Leu Rhe Cys Ala Ser Asp Ala Lys Ala Tyr Lys Thr Glu val 
50 55 60 

His Asn val Trp Ala Thr His Ala Cys val Pro Thr Asp Pro Asn Pro 
65 70 75 80 

Gin Glu Val Val Leu Glu Asn Val Thr Glu Asn Phe Asn Met Trp Lys 
85 90 95 

Asn Asn Met val Glu Gin Met His Glu Asp lie lie Ser Leu Trp Asp 
100 105 110 

Gin ser Leu Lys Pro Cys val Lys Leu Thr Pro Leu Cys val Thr Leu 
115 120 125 

Asn Cys Thr Asp Ala Asn Lys Asn Ala Thr Asn Thr Asn ser ser Ser 
130 135 140 

Gly Gly Thr Met Glu Lys Gly Glu Met Lys Asn Cys Ser Phe Asn lie 
145 150 155 160 

Thr Thr Ser lie Arg Asp Lys Met Gin Lys Glu Tyr Ala Leu Phe Tyr 
165 170 175 

Lys Leu Asp val val Pro lie Asp Asn Asp Asn Asn Ser Asn Asn Asn 
180 185 190 

Thr Asn Tyr Arg Leu lie Asn Cys Asn Thr Ser val lie Thr Gin Ala 
195 200 205 

Cys Pro Lys val ser Phe Glu Pro lie Pro lie His Tyr Cys Thr Pro 
210 215 220 

Page 45 



16336-13-2.ST25.txt 

Ala Gly Phe Ala lie Leu Lys Cys Asn Asp Lys Lys Phe Asn Gly Thr 

225 230 235 240 

Gly Pro Cys Lys Asn val Ser Thr val Gin cys Thr His Gly lie Arg 

245 250 255 

Pro Val Val Ser Thr Gin Leu Leu Leu Asn Gly Ser Leu Ala Glu Glu 

260 265 270 

Glu val val lie Arg ser Glu Asn Phe Thr Asp Asn Ala Lys Thr ile 

275 280 285 

Ile val Gin Leu Asn Glu ser val Glu lie Asn cys Thr Arg Pro Asn 

290 295 300 

Asn Asn Thr Arg Lys Ser lie Pro Ile Gly Pro Gly Arg Ala Leu Tyr 

305 310 315 320 

Thr Thr Gly Glu Ile Ile Gly Asp lie Arg Gin Ala His Cys Asn lie 

325 330 335 

ser Arg Ala Lys Trp Asn Asn Thr Leu Lys Gin val val Thr Lys Leu 

340 345 350 

Arg Glu Gin Phe Gly Asn Asn Lys Thr lie val Phe Asn Pro Ser Ser 

355 360 365 

Gly Gly Asp Pro Glu lie Val Met His Ser Phe Asn Cys Gly Gly Glu 

370 375 380 

Phe Phe Tyr Cys Asn Thr Thr Gin Leu Phe Asn Ser Thr Trp Asn Ser 

385 390 395 400 

Thr Glu Gly Ser Asn Lys Thr Thr Gly ser Asn Asn Thr Gly Gly Glu 

405 410 415 

Thr Ile Thr Leu Pro Cys Arg lie Lys Gin lie lie Asn Met Trp Gin 

420 425 430 

Glu val Gly Lys Ala Met Tyr Ala Pro Pro lie Arg Gly Gin lie Lys 

435 440 445 

Cys ser ser Asn lie Thr Gly Leu Leu Leu Thr Arg Asp Gly Gly Glu 

450 455 460 

Asn ser Thr Asn Glu Thr Glu lie Phe Arg Pro Gly Gly Gly Asp Met 

465 470 475 480 

Arg Asp Asn Trp Arg ser Glu Leu Tyr Lys Tyr Lys Val Val Lys Ile 

485 490 495 
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Glu Pro Leu Gly val Ala Pro Thr Lys Ala Lys Arg Arg val val Gin 
500 505 510 

Arg Glu Lys Arg Ala val Gly lie lie Gly Ala Met Phe Leu Gly Phe 
515 520 525 

Leu Gly Ala Ala Gly ser Thr Met Gly Ala Ala Ser Met Thr Leu Thr 
530 535 540 

Val Gin Ala Arg Gin Leu Leu Ser Gly lie val Gin Gin Gin Asn Asn 
545 550 555 560 

Leu Leu Arg Ala lie Glu Ala Gin Gin His Leu Leu Gin Leu Thr Val 
565 570 575 

Trp Gly lie Lys Gin Leu Gin Ala Arg val Leu Ala val Glu Arg Tyr 
580 585 590 

Leu Arg Asp Gin Gin Leu Leu Gly lie Trp Gly cys ser Gly Lys Leu 
595 600 605 

lie cys Thr Thr Thr val Pro Trp Asn Ala ser Trp ser Asn Lys ser 
610 615 620 

Leu Asp Lys lie Trp Asn Asn Met Thr Trp Met Glu Trp Glu Arg Glu 
625 630 635 640 

lie Asp Asn Tyr Thr Gly Leu lie Tyr Asn Leu lie Glu Glu ser Gin 
645 650 655 

Asn Gin Gin Glu Lys Asn Glu Gin Glu Leu Leu Glu Leu Asp Lys Trp 
660 665 670 

Ala Ser Leu Trp Asn Trp Phe Asp lie Thr Gin Trp Leu Trp Tyr lie 
675 680 685 

Lys lie Phe He Met He val Gly Gly Leu val Gly Leu Arg He Val 
690 695 700 

Phe Ala val Leu Ser lie val Asn Arg val Arg Gin Gly Tyr ser Pro 
705 710 715 720 

Leu ser Phe Gin Thr Arg Leu Pro Ala Pro Arg Gly Pro Asp Arg Pro 
725 730 735 

Glu Gly lie Glu Glu Glu Gly Gly Glu Arg Asp Arg Asp Arg Ser Gly 
740 745 750 

Arg Leu val Asn Gly Phe Leu Ala Leu lie Trp Asp Asp Leu Arg ser 
755 760 765 
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Leu Cys Leu Phe sen Tyr His Arg Leu Arg Asp Leu Leu Leu lie Val 
770 775 780 

Ala Arg lie val Glu Leu Leu Gly Arg Arg Gly Trp Glu Ala Leu Lys 
785 790 795 800 

Tyr Trp Trp Asn Leu Leu Gin Tyr Trp ser Gin Glu Leu Lys Asn Ser 
805 810 815 

Ala val ser Leu Leu Asn Ala Thr Ala lie Ala val Ala Glu Gly Thr 
820 825 830 

Asp Arg val lie Glu val val Gin Arg Ala Cys Arg Ala lie Leu His 
835 840 845 

lie Pro Arg Arg lie Arg Gin Gly Leu Glu Arg Ala Leu Leu 
850 855 860 

<210> 51 

<211> 862 

<212> PRT 

<213> Artificial sequence 



<220> 

<223> Least squares and minimum of means center of tree reconstruction 
of clade B gp 160 protein sequence 

<400> 51 

Met Arg val Lys Gly lie Arg Lys Asn Tyr Gin His Leu Trp Arg Trp 
15 10 15 

Gly Thr Met Leu Leu Gly Met Leu Met lie Cys ser Ala Ala Glu Lys 
20 25 30 

Leu Trp val Thr val Tyr Tyr Gly val Pro val Trp Lys Glu Ala Thr 
35 40 45 

Thr Thr Leu Phe Cys Ala Ser Asp Ala Lys Ala Tyr Asp Thr Glu val 
50 55 60 

His Asn val Trp Ala Thr His Ala Cys val Pro Thr Asp Pro Asn Pro 
65 70 75 80 

Gin Glu val val Leu Glu Asn val Thr Glu Asn Phe Asn Met Trp Lys 
85 90 95 

Asn Asn Met val Glu Gin Met His Glu Asp lie lie Ser Leu Trp Asp 
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100 105 110 

Gin Ser Leu Lys Pro Cys val Lys Leu Thr Pro Leu Cys val Thr Leu 
115 120 125 

Asn Cys Thr Asp Leu Asn Lys Asn Ala Thr Asn Thr Asn Ser ser ser 
130 135 140 

Gly Glu Met Met Glu Lys Gly Glu lie Lys Asn Cys Ser Phe Asn lie 
145 150 155 160 

Thr Thr ser lie Arg Asp Lys val Gin Lys Glu Tyr Ala Leu Phe Tyr 
165 170 175 

Lys Leu Asp val val Pro lie Asp Asn Asp Asn Asn Thr Asn Asn Thr 
180 185 190 

Thr Ser Tyr Arg Leu lie ser Cys Asn Thr ser val lie Thr Gin Ala 
195 200 205 

Cys Pro Lys val Ser Phe Glu Pro lie Pro lie His Tyr Cys Ala Pro 
210 215 220 

Ala Gly Phe Ala lie Leu Lys Cys Asn Asp Lys Lys Phe Asn Gly Thr 
225 230 235 240 

Gly Pro Cys Thr Asn val Ser Thr Val Gin Cys Thr His Gly lie Arg 
245 250 255 

Pro val val Ser Thr Gin Leu Leu Leu Asn Gly Ser Leu Ala Glu Glu 
260 265 270 

Glu val val lie Arg ser Asp Asn Phe Thr Asp Asn Ala Lys Thr lie 
275 280 285 

lie Val Gin Leu Asn Glu ser val Glu lie Asn Cys Thr Arg Pro Asn 
290 295 300 

Asn Asn Thr Arg Lys Ser lie His lie Gly Pro Gly Arg Ala Phe Tyr 
305 310 315 320 

Thr Thr Gly Glu lie lie Gly Asp lie Arg Gin Ala His Cys Asn lie 
325 330 335 

Ser Arg Ala Lys Trp Asn Asn Thr Leu Lys Gin lie val Lys Lys Leu 
340 345 350 

Arg Glu Gin Phe Gly Asn Asn Lys Thr lie Val Phe Asn Gin Ser Ser 
355 360 365 

Gly Gly Asp Pro Glu lie val Met His Ser Phe Asn cys Gly Gly Glu 
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370 375 380 

Phe Phe Tyr Cys Asn Ser Thr Gin Leu Phe Asn Ser Thr Trp Asn Gly 
385 390 395 400 

Thr Trp Thr Trp Asn Thr Thr Glu Gly Ser Asn Asp Thr Glu Gly Asp 
405 410 415 

Thr lie Thr Leu Pro Cys Arg lie Lys Gin lie lie Asn Met Trp Gin 
420 425 430 

Glu val Gly Lys Ala Met Tyr Ala Pro Pro lie Arg Gly Gin lie Arg 
435 440 445 

Cys Ser ser Asn lie Thr Gly Leu Leu Leu Thr Arg Asp Gly Gly Asn 
450 455 460 

Asn Asn Thr Asn Glu Thr Glu lie Phe Arg Pro Gly Gly Gly Asp Met 
465 470 475 480 

Arg Asp Asn Trp Arg Ser Glu Leu Tyr Lys Tyr Lys val val Lys lie 
485 490 495 

Glu Pro Leu Gly val Ala Pro Thr Lys Ala Lys Arg Arg val val Gin 
500 505 510 

Arg Glu Lys Arg Ala Val Gly lie lie Gly Ala val Phe Leu Gly Phe 
515 520 525 

Leu Gly Ala Ala Gly Ser Thr Met Gly Ala Ala Ser Met Thr Leu Thr 
530 535 540 

Val Gin Ala Arg Gin Leu Leu Ser Gly lie val Gin Gin Gin Asn Asn 
545 550 555 560 

Leu Leu Arg Ala lie Glu Ala Gin Gin His Leu Leu Gin Leu Thr val 
565 570 575 

Trp Gly lie Lys Gin Leu Gin Ala Arg val Leu Ala val Glu Arg Tyr 
580 585 590 

Leu Arg Asp Gin Gin Leu Leu Gly lie Trp Gly Cys Ser Gly Lys Leu 
595 600 605 

lie Cys Thr Thr Ala val Pro Trp Asn Ala Ser Trp Ser Asn Lys Ser 
610 615 620 

Leu Asp Glu lie Trp Asn Asn Met Thr Trp Met Glu Trp Glu Arg Glu 
625 630 635 640 

lie Asp Asn Tyr Thr Ser Leu lie Tyr Thr Leu lie Glu Glu Ser Gin 
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645 650 655 

Asn Gln Gln Glu Lys Asn Glu Gln Glu Leu Leu Glu Leu Asp Lys Trp 
660 665 670 

Ala Ser Leu Trp Asn Trp Phe Asp lie Thr Asn Trp Leu Trp Tyr lie 

675 680 685 

Lys lie Phe lie Met lie val Gly Gly Leu Val Gly Leu Arg lie Val 
690 695 700 

Phe Ala val Leu ser lie val Asn Arg val Arg Gln Gly Tyr Ser Pro 
705 710 715 720 

Leu Ser Phe Gln Thr Arg Leu Pro Ala Pro Arg Gly Pro Asp Arg Pro 
725 730 735 

Glu Gly lie Glu Glu Glu Gly Gly Glu Arg Asp Arg Asp Arg ser Gly 
740 745 750 

Arg Leu val Asn Gly Phe Leu Ala Leu lie Trp Asp Asp Leu Arg ser 
755 760 765 

Leu cys Leu Phe ser Tyr His Arg Leu Arg Asp Leu Leu Leu lie val 
770 775 780 

Thr Arg lie Val Glu Leu Leu Gly Arg Arg Gly Trp Glu Ala Leu Lys 
785 790 795 800 

Tyr Trp Trp Asn Leu Leu Gln Tyr Trp Ser Gln Glu Leu Lys Asn Ser 
805 810 815 

Ala val Ser Leu Leu Asn Ala Thr Ala lie Ala val Ala Glu Gly Thr 
820 825 830 

Asp Arg val lie Glu val val Gln Arg Ala Cys Arg Ala lie Leu His 
835 840 845 

lie Pro Thr Arg lie Arg Gln Gly Leu Glu Arg Ala Leu Leu 
850 855 860 

<210> 52 

<211> 206 

<212> PRT 

<213> Artificial sequence 



<220> 

<223> Most recent common ancestor reconstruction of clade B nef protein 
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sequence 
<400> 52 

Met Gly Gly Lys Trp Ser Lys Arg ser val val Gly Trp Pro Ala val 
15 10 15 

Arg Glu Arg Met Arg Arg Ala Glu Pro Ala Ala Asp Gly val Gly Ala 
20 25 30 

val Ser Arg Asp Leu Glu Lys His Gly Ala lie Thr Ser Ser Asn Thr 
35 40 45 

Ala Ala Thr Asn Ala Ala Cys Ala Trp Leu Glu Ala Gin Glu Glu Glu 
50 55 60 

Glu Val Gly Phe Pro val Arg Pro Gin val Pro Leu Arg Pro Met Thr 
65 70 75 80 

Tyr Lys Ala Ala val Asp Leu ser His Phe Leu Lys Glu Lys Gly Gly 
85 90 95 

Leu Glu Gly Leu val Tyr ser Gin Lys Arg Gin Asp lie Leu Asp Leu 
100 105 110 

Trp val Tyr His Thr Gin Gly Tyr Phe Pro Asp Trp Gin Asn Tyr Thr 
115 120 125 

Pro Gly Pro Gly Thr Arg Tyr Pro Leu Thr Phe Gly Trp Cys Phe Lys 
130 135 140 

Leu val Pro val Glu Pro Glu Lys val Glu Glu Ala Thr Glu Gly Glu 
145 150 155 160 

Asn Asn Ser Leu Leu His Pro Met Ser Leu His Gly Met Asp Asp Pro 
165 170 175 

Glu Arg Glu val Leu val Trp Arg Phe Asp Ser Arg Leu Ala Phe His 
180 185 190 

His Met Ala Arg Glu Lys His Pro Glu Tyr Tyr Lys Asp Cys 
195 200 205 

<210> 53 

<211> 206 

<212> PRT 

<213> Artificial sequence 



<220> 
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<223> Least squares center of tree reconstruction of clade B nef protei 
n sequence 

<400> 53 

Met Gly Gly Lys Trp ser Lys Arg ser val val Gly Trp Pro Ala val 
15 10 15 

Arg Glu Arg Met Arg Arg Ala Glu Pro Ala Ala Asp Gly val Gly Ala 
20 25 30 

Val Ser Arg Asp Leu Glu Lys His Gly Ala lie Thr Ser ser Asn Thr 
35 40 45 

Ala Ala Thr Asn Ala Asp Cys Ala Trp Leu Glu Ala Gin Glu Glu Glu 
50 55 60 

Glu Val Gly Phe Pro val Arg Pro Gin val Pro Leu Arg Pro Met Thr 
65 70 75 80 

Tyr Lys Ala Ala Leu Asp Leu Ser His Phe Leu Lys Glu Lys Gly Gly 
85 90 95 

Leu Glu Gly Leu lie Tyr Ser Gin Lys Arg Gin Asp lie Leu Asp Leu 
100 105 110 

Trp val Tyr His Thr Gin Gly Tyr Phe Pro Asp Trp Gin Asn Tyr Thr 
115 120 125 

Pro Gly Pro Gly lie Arg Tyr Pro Leu Thr Phe Gly Trp cys Phe Lys 
130 135 140 

Leu val Pro val Glu Pro Glu Lys val Glu Glu Ala Asn Glu Gly Glu 
145 150 155 160 

Asn Asn Ser Leu Leu His Pro Met ser Leu His Gly Met Asp Asp Pro 
165 170 175 

Glu Lys Glu val Leu val Trp Lys Phe Asp Ser Arg Leu Ala Phe His 
180 185 190 

His Met Ala Arg Glu Leu His Pro Glu Tyr Tyr Lys Asp Cys 
195 200 205 

<210> 54 

<211> 206 

<212> PRT 

<213> Artificial sequence 



<220> 
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<223> Minimum of means center of tree reconstruction of clade B nef pro 
tein sequence 

<400> 54 

Met Gly Gly Lys Trp Ser Lys Arg Ser val val Gly Trp Pro Ala val 
15 10 15 

Arg Glu Arg Met Arg Arg Ala Glu Pro Ala Ala Asp Gly val Gly Ala 
20 25 30 

Val ser Arg Asp Leu Glu Lys His Gly Ala lie Thr ser ser Asn Thr 
35 40 45 

Ala Ala Thr Asn Ala Asp Cys Ala Trp Leu Glu Ala Gin Glu Glu Glu 
50 55 60 

Glu val Gly Phe Pro Val Arg Pro Gin val Pro Leu Arg Pro Met Thr 
65 70 75 80 

Tyr Lys Ala Ala Leu Asp Leu ser His Phe Leu Lys Glu Lys Gly Gly 
85 90 95 

Leu Glu Gly Leu lie Tyr ser Gin Lys Arg Gin Asp lie Leu Asp Leu 
100 105 110 

Trp Val Tyr His Thr Gin Gly Tyr Phe Pro Asp Trp Gin Asn Tyr Thr 
115 120 125 

Pro Gly Pro Gly lie Arg Tyr Pro Leu Thr Phe Gly Trp Cys Phe Lys 
130 135 140 

Leu val Pro val Glu Pro Glu Lys val Glu Glu Ala Asn Glu Gly Glu 
145 150 155 160 

Asn Asn Cys Leu Leu His Pro Met ser Gin His Gly Met Asp Asp Pro 
165 170 175 

Glu Lys Glu val Leu val Trp Lys Phe Asp Ser Arg Leu Ala Phe His 
180 185 190 

His Met Ala Arg Glu Leu His Pro Glu Tyr Tyr Lys Asp Cys 
195 200 205 

<210> 55 

<211> 1003 

<212> PRT 

<213> Artificial sequence 
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<220> 

<223> Most recent common ancestor reconstruction of clade B pol protein 
sequence 

<400> 55 

Phe Phe Arg Glu Asn Leu Ala Phe Pro Gin Gly Lys Ala Arg Glu Leu 
15 10 15 

Ser Ser Glu Gin Thr Arg Ala Asn ser Pro Thr Arg Arg Glu Leu Gin 
20 25 30 

val Trp Gly Arg Asp Asn Asn ser Leu Ser Glu Ala Gly Ala Asp Arg 
35 40 45 

Gin Gly Thr Val Ser Phe Ser Phe Pro Gin lie Thr Leu Trp Gin Arg 
50 55 60 

Pro Leu Val Thr lie Lys lie Gly Gly Gin Leu Lys Glu Ala Leu Leu 
65 70 75 80 

Asp Thr Gly Ala Asp Asp Thr val Leu Glu Glu Met Asn Leu Pro Gly 
85 90 95 

Lys Trp Lys Pro Lys Met lie Gly Gly lie Gly Gly Phe lie Lys val 
100 105 110 

Arg Gin Tyr Asp Gin lie Pro lie Glu lie Cys Gly His Lys Ala lie 
115 120 125 

Gly Thr val Leu val Gly Pro Thr Pro val Asn lie lie Gly Arg Asn 
130 135 140 

Leu Leu Thr Gin lie Gly Cys Thr Leu Asn Phe Pro lie Ser Pro lie 
145 150 155 160 

Glu Thr val Pro val Lys Leu Lys Pro Gly Met Asp Gly Pro Lys val 
165 170 175 

Lys Gin Trp Pro Leu Thr Glu Glu Lys lie Lys Ala Leu val Glu lie 
180 185 190 

Cys Thr Glu Met Glu Lys Glu Gly Lys lie Ser Lys lie Gly Pro Glu 
195 200 205 

Asn Pro Tyr Asn Thr Pro Val Phe Ala lie Lys Lys Lys Asp Ser Thr 
210 215 220 

Lys Trp Arg Lys Leu val Asp Phe Arg Glu Leu Asn Lys Arg Thr Gin 
225 230 235 240 

Asp Phe Trp Glu val Gin Leu Gly He Pro His Pro Ala Gly Leu Lys 
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245 250 255 

Lys Lys Lys Ser val Thr val Leu Asp val Gly Asp Ala Tyr Phe Ser 
260 265 270 

val Pro Leu Asp Glu Asp Phe Arg Lys Tyr Thr Ala Phe Thr lie Pro 

275 280 285 

ser lie Asn Asn Glu Thr Pro Gly lie Arg Tyr Gin Tyr Asn val Leu 
290 295 300 

Pro Gin Gly Trp Lys Gly ser Pro Ala lie Phe Gin ser ser Met Thr 
305 310 315 320 

Lys lie Leu Glu Pro Phe Arg Lys Gin Asn Pro Glu lie val lie Tyr 
325 330 335 

Gin Tyr Met Asp Asp Leu Tyr val Gly ser Asp Leu Glu lie Gly Gin 
340 345 350 

His Arg Thr Lys lie Glu Glu Leu Arg Glu His Leu Leu Arg Trp Gly 
355 360 365 

Phe Thr Thr Pro Asp Lys Lys His Gin Lys Glu Pro Pro Phe Leu Trp 
370 375 380 

Met Gly Tyr Glu Leu His Pro Asp Lys Trp Thr Val Gin Pro lie Val 
385 390 395 400 

Leu Pro Glu Lys Asp Ser Trp Thr val Asn Asp lie Gin Lys Leu val 
405 410 415 

Gly Lys Leu Asn Trp Ala Ser Gin lie Tyr Ala Gly lie Lys val Lys 
420 425 430 

Gin Leu Cys Lys Leu Leu Arg Gly Thr Lys Ala Leu Thr Glu val val 
435 440 445 

Pro Leu Thr Glu Glu Ala Glu Leu Glu Leu Ala Glu Asn Arg Glu lie 
450 455 460 

Leu Lys Glu Pro Val His Gly Val Tyr Tyr Asp Pro Ser Lys Asp Leu 
465 470 475 480 

He Ala Glu lie Gin Lys Gin Gly Gin Gly Gin Trp Thr Tyr Gin He 
485 490 495 

Tyr Gin Glu Pro Phe Lys Asn Leu Lys Thr Gly Lys Tyr Ala Arg Met 
500 505 510 

Arg Gly Ala His Thr Asn Asp val Lys Gin Leu Thr Glu Ala Val Gin 
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515 520 525 

Lys lie Ala Thr Glu ser lie val lie Trp Gly Lys Thr Pro Lys Phe 
530 535 540 

Lys Leu Pro lie Gin Lys Glu Thr Trp Glu Ala Trp Trp Thr Glu Tyr 
545 550 555 560 

Trp Gin Ala Thr Trp He Pro Glu Trp Glu Phe val Asn Thr Pro Pro 
565 570 575 

Leu Val Lys Leu Trp Tyr Gin Leu Glu Lys Glu Pro lie val Gly Ala 
580 585 590 

Glu Thr Phe Tyr val Asp Gly Ala Ala Asn Arg Glu Thr Lys Leu Gly 
595 600 605 

Lys Ala Gly Tyr val Thr Asp Arg Gly Arg Gin Lys Val Val Ser Leu 
610 615 620 

Thr Asp Thr Thr Asn Gin Lys Thr Glu Leu Gin Ala lie His Leu Ala 
625 630 635 640 

Leu Gin Asp ser Gly Leu Glu val Asn lie val Thr Asp ser Gin Tyr 
645 650 655 

Ala Leu Gly lie lie Gin Ala Gin Pro Asp Lys Ser Glu Ser Glu Leu 
660 665 670 

Val Ser Gin lie lie Glu Gin Leu lie Lys Lys Glu Lys val Tyr Leu 
675 680 685 

Ala Trp val Pro Ala His Lys Gly lie Gly Gly Asn Glu Gin val Asp 
690 695 700 

Lys Leu Val Ser Thr Gly lie Arg Lys Val Leu Phe Leu Asp Gly He 
705 710 715 720 

Asp Lys Ala Gin Glu Glu His Glu Lys Tyr His Ser Asn Trp Arg Ala 
725 730 735 

Met Ala Ser Asp Phe Asn Leu Pro Pro Val Val Ala Lys Glu lie val 
740 745 750 

Ala Ser cys Asp Lys Cys Gin Leu Lys Gly Glu Ala Met His Gly Gin 

755 760 765 

val Asp cys ser Pro Gly lie Trp Gin Leu Asp cys Thr His Leu Glu 
770 775 780 

Gly Lys Val He Leu val Ala val His val Ala Ser Gly Tyr He Glu 
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785 790 795 800 

Ala Glu val lie Pro Ala Glu Thr Gly Gin Glu Thr Ala Tyr Phe Leu 
805 810 815 

Leu Lys Leu Ala Gly Arg Trp Pro val Lys Val lie His Thr Asp Asn 
820 825 830 

Gly ser Asn Phe Thr ser Thr Thr val Lys Ala Ala Cys Trp Trp Ala 
835 840 845 

Gly lie Lys Gin Glu Phe Gly lie Pro Tyr Asn Pro Gin Ser Gin Gly 
850 855 860 

Val Val Glu Ser Met Asn Lys Glu Leu Lys Lys He He Gly Gin val 
865 870 875 880 

Arg Asp Gin Ala Glu His Leu Lys Thr Ala val Gin Met Ala val Phe 
885 890 895 

lie His Asn Phe Lys Arg Lys Gly Gly lie Gly Gly Tyr ser Ala Gly 
900 905 910 

Glu Arg lie val Asp lie ile Ala Thr Asp lie Gin Thr Lys Glu Leu 
915 920 925 

Gin Lys Gin lie Thr Lys lie Gin Asn Phe Arg val Tyr Tyr Arg Asp 
930 935 940 

Ser Arg Asp Pro Leu Trp Lys Gly Pro Ala Lys Leu Leu Trp Lys Gly 
945 950 955 960 

Glu Gly Ala val Val Ile Gin Asp Asn Ser Asp He Lys Val Val Pro 
965 970 975 

Arg Arg Lys Ala Lys lie lie Arg Asp Tyr Gly Lys Gin Met Ala Gly 
980 985 990 

Asp Asp cys val Ala ser Arg Gin Asp Glu Asp 
995 1000 

<210> 56 

<211> 1003 

<212> PRT 

<213> Artificial sequence 



<220> 

<223> Least squares center of tree reconstruction of clade B pol protei 
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n sequence 
<400> 56 

Phe Phe Arg Glu Asp Leu Ala Phe Pro Gin Gly Lys Ala Arg Glu Phe 
15 10 15 

Ser Ser Glu Gin Thr Arg Ala Asn ser Pro Thr Arg Arg Glu Leu Gin 
20 25 30 

val Trp Gly Arg Asp Asn Asn ser Leu Ser Glu Ala Gly Ala Asp Arg 
35 40 45 

Gin Gly Thr val Ser Phe Ser Phe Pro Gin lie Thr Leu Trp Gin Arg 
50 55 60 

Pro Leu val Thr He Lys He Gly Gly Gin Leu Lys Glu Ala Leu Leu 
65 70 75 80 

Asp Thr Gly Ala Asp Asp Thr val Leu Glu Glu Met Asn Leu Pro Gly 
85 90 95 

Arg Trp Lys Pro Lys Met lie Gly Gly lie Gly Gly Phe lie Lys val 
100 105 110 

Arg Gin Tyr Asp Gin lie Pro lie Glu lie cys Gly His Lys Ala lie 
115 120 125 

Gly Thr Val Leu Val Gly Pro Thr Pro val Asn lie lie Gly Arg Asn 
130 135 140 

Leu Leu Thr Gin lie Gly Cys Thr Leu Asn Phe Pro lie Ser Pro lie 
145 150 155 160 

Glu Thr Val Pro Val Lys Leu Lys Pro Gly Met Asp Gly Pro Lys Val 
165 170 175 

Lys Gin Trp Pro Leu Thr Glu Glu Lys lie Lys Ala Leu val Glu lie 
180 185 190 

Cys Thr Glu Met Glu Lys Glu Gly Lys lie Ser Lys lie Gly Pro Glu 
195 200 205 

Asn Pro Tyr Asn Thr Pro Val Phe Ala lie Lys Lys Lys Asp Ser Thr 
210 215 220 

Lys Trp Arg Lys Leu val Asp Phe Arg Glu Leu Asn Lys Arg Thr Gin 
22S 230 235 240 

Asp Phe Trp Glu val Gin Leu Gly lie pro His Pro Ala Gly Leu Lys 
245 250 255 
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Lys Lys Lys ser val Thr val Leu Asp val Gly Asp Ala Tyr Phe ser 
260 265 270 

Val Pro Leu Asp Glu Asp Phe Arg Lys Tyr Thr Ala Phe Thr He Pro 
275 280 285 

Ser lie Asn Asn Glu Thr Pro Gly lie Arg Tyr Gin Tyr Asn val Leu 
290 295 300 

Pro Gin Gly Trp Lys Gly ser Pro Ala lie Phe Gin ser ser Met Thr 
305 310 315 320 

Lys He Leu Glu Pro Phe Arg Lys Gin Asn Pro Asp He val He Tyr 
325 330 335 

Gin Tyr Met Asp Asp Leu Tyr val Gly Ser Asp Leu Glu lie Gly Gin 
340 345 350 

His Arg Thr Lys He Glu Glu Leu Arg Gin His Leu Leu Arg Trp Gly 
355 360 365 

Phe Thr Thr Pro Asp Lys Lys His Gin Lys Glu Pro Pro Phe Leu Trp 
370 375 380 

Met Gly Tyr Glu Leu His Pro Asp Lys Trp Thr val Gin Pro He val 
385 390 395 400 

Leu Pro Glu Lys Asp ser Trp Thr val Asn Asp lie Gin Lys Leu val 
405 410 415 

Gly Lys Leu Asn Trp Ala Ser Gin lie Tyr Ala Gly lie Lys Val Lys 
420 425 430 

Gin Leu Cys Lys Leu Leu Arg Gly Thr Lys Ala Leu Thr Glu val lie 
435 440 445 

Pro Leu Thr Glu Glu Ala Glu Leu Glu Leu Ala Glu Asn Arg Glu lie 
450 455 460 

Leu Lys Glu Pro Val His Gly val Tyr Tyr Asp Pro ser Lys Asp Leu 
465 470 475 480 

He Ala Glu He Gin Lys Gin Gly Gin Gly Gin Trp Thr Tyr Gin He 
485 490 495 

Tyr Gin Glu Pro Phe Lys Asn Leu Lys Thr Gly Lys Tyr Ala Arg Met 
500 505 510 

Arg Gly Ala His Thr Asn Asp val Lys Gin Leu Thr Glu Ala val Gin 
515 520 525 
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Lys lie Ala Thr Glu Ser lie val lie Trp Gly Lys Thr Pro Lys Phe 
530 535 540 

Lys Leu Pro lie Gin Lys Glu Thr Trp Glu Ala Trp Trp Thr Glu Tyr 
545 550 555 560 

Trp Gin Ala Thr Trp lie pro Glu Trp Glu Phe val Asn Thr Pro Pro 
565 570 575 

Leu val Lys Leu Trp Tyr Gin Leu Glu Lys Glu Pro lie val Gly Ala 
580 585 590 

Glu Thr Phe Tyr val Asp Gly Ala Ala Asn Arg Glu Thr Lys Leu Gly 
595 600 605 

Lys Ala Gly Tyr val Thr Asp Arg Gly Arg Gin Lys val val ser Leu 
610 615 620 

Thr Asp Thr Thr Asn Gin Lys Thr Glu Leu Gin Ala lie His Leu Ala 
625 630 635 640 

Leu Gin Asp Ser Gly Leu Glu val Asn lie val Thr Asp Ser Gin Tyr 
645 650 655 

Ala Leu Gly lie lie Gin Ala Gin Pro Asp Lys ser Glu ser Glu Leu 
660 665 670 

val Ser Gin lie lie Glu Gin Leu lie Lys Lys Glu Lys val Tyr Leu 
675 680 685 

Ala Trp val Pro Ala His Lys Gly lie Gly Gly Asn Glu Gin val Asp 
690 695 700 

Lys Leu Val ser Ala Gly lie Arg Lys Val Leu Phe Leu Asp Gly lie 
705 710 715 720 

Asp Lys Ala Gin Glu Glu His Glu Lys Tyr His ser Asn Trp Arg Ala 
725 730 735 

Met Ala Ser Asp Phe Asn Leu Pro Pro Val Val Ala Lys Glu lie val 
740 745 750 

Ala Ser cys Asp Lys Cys Gin Leu Lys Gly Glu Ala Met His Gly Gin 
755 760 765 

val Asp Cys Ser pro Gly lie Trp Gin Leu Asp Cys Thr His Leu Glu 
770 775 780 

Gly Lys val lie Leu val Ala val His val Ala Ser Gly Tyr lie Glu 
785 790 795 800 
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Ala Glu val lie Pro Ala Glu Thr Gly Gin Glu Thr Ala Tyr phe Leu 
805 810 815 

Leu Lys Leu Ala Gly Arg Trp Pro val Lys Thr lie His Thr Asp Asn 
820 825 830 

Gly Ser Asn Phe Thr Ser Thr Thr val Lys Ala Ala cys Trp Trp Ala 
835 840 845 

Gly lie Lys Gin Glu Phe Gly lie Pro Tyr Asn Pro Gin Ser Gin Gly 
850 855 860 

Val Val Glu Ser Met Asn Lys Glu Leu Lys Lys lie lie Gly Gin val 
865 870 875 880 

Arg Asp Gin Ala Glu His Leu Lys Thr Ala val Gin Met Ala val Phe 
885 890 895 

lie His Asn Phe Lys Arg Lys Gly Gly lie Gly Gly Tyr ser Ala Gly 
900 905 910 

Glu Arg He val Asp lie lie Ala Thr Asp lie Gin Thr Lys Glu Leu 
915 920 925 

Gin Lys Gin lie Thr Lys lie Gin Asn Phe Arg val Tyr Tyr Arg Asp 
930 935 940 

Ser Arg Asp Pro Leu Trp Lys Gly Pro Ala Lys Leu Leu Trp Lys Gly 
945 950 955 960 

Glu Gly Ala val val He Gin Asp Asn ser Asp He Lys val val Pro 
965 970 975 

Arg Arg Lys Ala Lys lie lie Arg Asp Tyr Gly Lys Gin Met Ala Gly 
980 985 990 

Asp Asp cys Val Ala Ser Arg Gin Asp Glu Asp 
995 1000 

<210> 57 

<211> 1003 

<212> PRT 

<213> Artificial sequence 



<220> 

<223> Minimum of means center of tree reconstruction of clade B pol pro 
tein sequence 

<400> 57 
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Phe Phe Arg Glu Asp Leu Ala Phe Pro Gin Gly Lys Ala Arg Glu Phe 
15 10 15 

ser ser Glu Gin Thr Arg Ala Asn ser pro Thr Arg Arg Glu Leu Gin 
20 25 30 

val Trp Gly Arg Asp Asn Asn ser Leu Ser Glu Ala Gly Ala Asp Arg 
35 40 45 

Gin Gly Thr val Ser Phe Ser Phe Pro Gin lie Thr Leu Trp Gin Arg 
50 55 60 

Pro Leu Val Thr lie Lys lie Gly Gly Gin Leu Lys Glu Ala Leu Leu 

65 70 75 80 

Asp Thr Gly Ala Asp Asp Thr val Leu Glu Glu Met Asn Leu Pro Gly 
85 90 95 

Arg Trp Lys Pro Lys Met lie Gly Gly lie Gly Gly Phe lie Lys val 
100 105 110 

Arg Gin Tyr Asp Gin lie Leu lie Glu lie Cys Gly His Lys Ala lie 
115 120 125 

Gly Thr val Leu val Gly Pro Thr Pro val Asn lie lie Gly Arg Asn 
130 135 140 

Leu Leu Thr Gin lie Gly Cys Thr Leu Asn Phe Pro lie Ser Pro lie 

145 150 155 160 

Glu Thr Val Pro Val Lys Leu Lys Pro Gly Met Asp Gly Pro Lys val 
165 170 175 

Lys Gin Trp Pro Leu Thr Glu Glu Lys lie Lys Ala Leu Val Glu lie 
180 185 190 

Cys Thr Glu Met Glu Lys Glu Gly Lys lie Ser Lys lie Gly Pro Glu 
195 200 205 

Asn pro Tyr Asn Thr Pro val Phe Ala lie Lys Lys Lys Asp ser Thr 
210 215 220 

Lys Trp Arg Lys Leu val Asp Phe Arg Glu Leu Asn Lys Arg Thr Gin 

225 230 235 240 

Asp Phe Trp Glu val Gin Leu Gly He Pro His Pro Ala Gly Leu Lys 
245 250 255 

Lys Lys Lys Ser Val Thr val Leu Asp val Gly Asp Ala Tyr Phe Ser 
260 265 270 
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val Pro Leu Asp Glu Asp Phe Arg Lys Tyr Thr Ala Phe Thr lie Pro 
275 280 285 

Ser lie Asn Asn Glu Thr Pro Gly lie Arg Tyr Gin Tyr Asn val Leu 
290 295 300 

Pro Gin Gly Trp Lys Gly Ser Pro Ala lie Phe Gin ser Ser Met Thr 
305 310 315 320 

Lys lie Leu Glu Pro Phe Arg Lys Gin Asn Pro Asp lie val lie Tyr 
325 330 335 

Gin Tyr Met Asp Asp Leu Tyr val Gly Ser Asp Leu Glu lie Gly Gin 
340 345 350 

His Arg Thr Lys lie Glu Glu Leu Arg Gin His Leu Leu Arg Trp Gly 
355 360 365 

Phe Thr Thr Pro Asp Lys Lys His Gin Lys Glu Pro Pro Phe Leu Trp 
370 375 380 

Met Gly Tyr Glu Leu His Pro Asp Lys Trp Thr val Gin Pro lie val 
385 390 395 400 

Leu Pro Glu Lys Asp Ser Trp Thr val Asn Asp lie Gin Lys Leu val 
405 410 415 

Gly Lys Leu Asn Trp Ala Ser Gin lie Tyr Pro Gly lie Lys val Lys 
420 425 430 

Gin Leu Cys Lys Leu Leu Arg Gly Thr Lys Ala Leu Thr Glu val lie 
435 440 445 

Pro Leu Thr Glu Glu Ala Glu Leu Glu Leu Ala Glu Asn Arg Glu lie 
450 455 460 

Leu Lys Glu Pro Val His Gly val Tyr Tyr Asp Pro Ser Lys Asp Leu 
465 470 475 480 

lie Ala Glu lie Gin Lys Gin Gly Gin Gly Gin Trp Thr Tyr Gin lie 
485 490 495 

Tyr Gin Glu Pro Phe Lys Asn Leu Lys Thr Gly Lys Tyr Ala Arg Met 
500 505 510 

Arg Gly Ala His Thr Asn Asp val Lys Gin Leu Thr Glu Ala val Gin 
515 520 525 

Lys lie Ala Thr Glu Ser lie val lie Trp Gly Lys Thr Pro Lys Phe 
530 535 540 
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Lys Leu Pro He Gin Lys Glu Thr Trp Glu Thr Trp Trp Thr Glu Tyr 
545 550 555 560 

Trp Gin Ala Thr Trp lie Pro Glu Trp Glu Phe Val Asn Thr Pro Pro 

565 570 575 

Leu val Lys Leu Trp Tyr Gin Leu Glu Lys Glu Pro lie val Gly Ala 
580 585 590 

Glu Thr Phe Tyr Val Asp Gly Ala Ala Asn Arg Glu Thr Lys Leu Gly 
595 600 605 

Lys Ala Gly Tyr val Thr Asn Arg Gly Arg Gin Lys val val Ser Leu 
610 615 620 

Thr Asp Thr Thr Asn Gin Lys Thr Glu Leu Gin Ala lie His Leu Ala 
625 630 635 640 

Leu Gln Asp Ser Gly Leu Glu Val Asn lie val Thr Asp Ser Gin Tyr 
645 650 655 

Ala Leu Gly lie ile Gin Ala Gin pro Asp Lys ser Glu ser Glu Leu 
660 665 670 

val ser Gin lie lie Glu Gin Leu Ile Lys Lys Glu Lys Val Tyr Leu 
675 680 685 

Ala Trp val Pro Ala His Lys Gly lie Gly Gly Asn Glu Gin val Asp 
690 695 700 

Lys Leu Val ser Ala Gly Ile Arg Lys Val Leu Phe Leu Asp Gly lie 
705 710 715 720 

Asp Lys Ala Gin Glu Glu His Glu Lys Tyr His Ser Asn Trp Arg Ala 
725 730 735 

Met Ala Ser Asp Phe Asn Leu Pro Pro val val Ala Lys Glu lie val 
740 745 750 

Ala Ser cys Asp Lys Cys Gin Leu Lys Gly Glu Ala Met His Gly Gin 
755 760 765 

Val Asp Cys Ser Pro Gly lie Trp Gin Leu Asp Cys Thr His Leu Glu 
770 775 780 

Gly Lys Val Ile Leu val Ala val His val Ala Ser Gly Tyr Ile Glu 
785 790 795 800 

Ala Glu val lie Pro Ala Glu Thr Gly Gin Glu Thr Ala Tyr Phe Leu 
805 810 815 
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Leu Lys Leu Ala Gly Arg Trp Pro val Lys Thr lie His Thr Asp Asn 
820 825 830 

Gly Ser Asn Phe Thr Ser Thr Thr val Lys Ala Ala cys Trp Trp Ala 
835 840 845 

Gly lie Lys Gin Glu Phe Gly lie Pro Tyr Asn Pro Gin Ser Gin Gly 
850 855 860 

Val Val Glu ser Met Asn Lys Glu Leu Lys Lys lie lie Gly Gin val 
865 870 875 880 

Arg Asp Gin Ala Glu His Leu Lys Thr Ala val Gin Met Ala val Phe 
885 890 895 

lie His Asn Phe Lys Arg Lys Gly Gly lie Gly Gly Tyr ser Ala Gly 
900 905 910 

Glu Arg lie val Asp lie lie Ala Thr Asp lie Gin Thr Lys Glu Leu 
915 920 925 

Gin Lys Gin lie Thr Lys lie Gin Asn Phe Arg val Tyr Tyr Arg Asp 
930 935 940 

ser Arg Asp Pro Leu Trp Lys Gly Pro Ala Lys Leu Leu Trp Lys Gly 
945 950 955 960 

Glu Gly Ala val val lie Gin Asp Asn ser Asp lie Lys val val Pro 
965 970 975 

Arg Arg Lys Ala Lys lie lie Arg Asp Tyr Gly Lys Gin Met Ala Gly 
980 985 990 

Asp Asp Cys val Ala Ser Arg Gin Asp Glu Asp 
995 1000 

<210> 58 

<211> 116 

<212> PRT 

<213> Artificial sequence 



<220> 

<223> Most recent common ancestor reconstruction of clade B rev protein 
sequence 

<400> 58 

Met Ala Gly Arg ser Gly Asp Ser Asp Glu Glu Leu Leu Lys Thr val 
15 10 15 
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Arg Leu lie Lys Phe Leu Tyr Gin ser Asn Pro Pro Pro ser Pro Glu 
20 25 30 

Gly Thr Arg Gin Ala Arg Arg Asn Arg Arg Arg Arg Trp Arg Glu Arg 
35 40 45 

Gin Arg Gin lie Arg Ser lie Ser Glu Arg lie Leu ser Thr Tyr Leu 
50 55 60 

Gly Arg ser Ala Glu Pro val Pro Leu Gin Leu Pro Pro Leu Glu Arg 
65 70 75 80 

Leu Thr Leu Asp cys ser Glu Asp cys Gly Thr ser Gly Thr Gin Gly 
85 90 95 

val Gly ser pro Gin ile Leu val Glu ser Pro Ala val Leu Glu ser 
100 105 110 

Gly Thr Lys Glu 
115 

<210> 59 

<211> 116 

<212> PRT 

<213> ; 



<400> 59 

Met Ala Gly Arg ser Gly Asp ser Asp Glu Glu Leu Leu Lys Thr val 
15 10 15 

Arg Leu lie Lys Phe Leu Tyr Gin ser Asn pro pro pro ser pro Glu 
20 25 30 

Gly Thr Arg Gin Ala Arg Arg Asn Arg Arg Arg Arg Trp Arg Glu Arg 
35 40 45 

Gin Arg Gin lie Arg Ser lie ser Glu Trp lie Leu ser Thr Tyr Leu 
50 55 60 

Gly Arg Pro Ala Glu Pro val Pro Leu Gin Leu Pro Pro Leu Glu Arg 
65 70 75 80 

Leu Thr Leu Asp Cys Asn Glu Asp Cys Gly Thr ser Gly Thr Gin Gly 
85 90 95 

Val Gly Ser Pro Gin lie Leu val Glu Ser Pro Thr val Leu Glu Ser 
100 105 110 
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Gly Thr Lys Glu 
115 

<210> 60 

<211> 116 

<212> PRT 

<213> Artificial sequence 
<220> 

<223> Minimum of means center of tree reconstruction of dade B rev pro 
tein sequence 

<400> 60 

Met Ala Gly Arg ser Gly Asp ser Asp Glu Glu Leu Leu Lys Thr val 
15 10 15 

Arg Leu lie Lys Phe Leu Tyr Gin Ser Asn Pro Pro Pro Ser Pro Glu 
20 25 30 

Gly Thr Arg Gin Ala Arg Arg Asn Arg Arg Arg Arg Trp Arg Glu Arg 
35 40 45 

Gin Arg Gin lie Arg ser lie ser Glu Trp lie Leu Ser Thr Tyr Leu 
50 55 60 

Gly Arg Pro Ala Glu Pro val Pro Leu Gin Leu Pro Pro Leu Glu Arg 
65 70 75 80 

Leu Thr Leu Asp Cys ser Glu Asp Cys Gly Thr Ser Gly Thr Gin Gly 
85 90 95 

val Gly Ser Pro Gin lie Leu Val Glu Ser Pro Ala val Leu Glu Ser 
100 105 110 

Gly Thr Lys Glu 
115 

<210> 61 

<211> 101 

<212> PRT 

<213> Artificial sequence 



<220> 

<223> Most common recent ancestor reconstruction of clade B tat protein 
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sequence 
<400> 61 

Met Glu Pro val Asp Pro Arg Leu Glu Pro Trp Lys His Pro Gly Ser 
15 10 15 

Gin Pro Lys Thr Ala Cys Thr Asn Cys Tyr Cys Lys Lys Cys Cys Tyr 
20 25 30 

His Cys Gin Val cys Phe lie Thr Lys Gly Leu Gly lie Ser Tyr Gly 
35 40 45 

Arg Lys Lys Arg Arg Gin Arg Arg Arg Pro Pro Gin Gly Ser Gin Thr 
50 55 60 

His Gin Val Ser Leu Ser Lys Gin Pro Ala Ser Gin Pro Arg Gly Asp 
65 70 75 80 

Pro Thr Gly Pro Lys Glu Ser Lys Lys Lys val Glu Arg Glu Thr Glu 
85 90 95 

Thr Asp Pro Val Asp 
100 

<210> 62 
<211> 101 
<212> PRT 

<213> Artificial sequence 



<220> 

<223> Least squares and minimum of means reconstruction of clade B tat 
protein sequence 

<400> 62 

Met Glu Pro Val Asp Pro Arg Leu Glu Pro Trp Lys His Pro Gly Ser 
15 10 15 

Gin Pro Lys Thr Ala Cys Thr Asn cys Tyr Cys Lys Lys Cys Cys Phe 
20 25 30 

His Cys Gin Val Cys Phe lie Thr Lys Gly Leu Gly lie Ser Tyr Gly 
35 40 45 

Arg Lys Lys Arg Arg Gin Arg Arg Arg Ala Pro Gin Asp Ser Gin Thr 
50 55 60 

His Gin val Ser Leu ser Lys Gin Pro Ala Ser Gin Pro Arg Gly Asp 
65 70 75 80 
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Pro Thr Gly Pro Lys Glu Ser Lys Lys Lys val Glu Arg Glu Thr Glu 
85 90 95 

Thr Asp Pro val Asp 
100 

<210> 63 
<211> 192 
<212> PRT 

<213> Artificial sequence 



<220> 

<223> Most recent common ancestor reconstruction of clade B vif protein 
sequence 

<400> 63 

Met Glu Asn Arg Trp Gin val Met lie val Trp Gin val Asp Arg Met 
1 5 10 15 

Arg lie Arg Thr Trp Lys ser Leu val Lys His His Met Tyr lie ser 
20 25 30 

Lys Lys Ala Lys Gly Trp Phe Tyr Arg His His Tyr Glu Ser Thr His 
35 40 45 

Pro Arg lie Ser Ser Glu Val His lie Pro Leu Gly Asp Ala Arg Leu 
50 55 60 

Val lie Lys Thr Tyr Trp Gly Leu His Thr Gly Glu Arg Glu Trp His 
65 70 75 80 

Leu Gly Gin Gly val ser lie Glu Trp Arg Lys Arg Arg Tyr ser Thr 
85 90 95 

Gin val Asp Pro Gly Leu Ala Asp Gin Leu lie His Leu Tyr Tyr Phe 
100 105 110 

Asp Cys Phe ser Glu ser Ala lie Arg Asn Ala lie Leu Gly His ile 
115 120 125 

Val ser Pro Arg cys Glu Tyr Gin Ala Gly His Asn Lys val Gly ser 
130 135 140 

Leu Gin Tyr Leu Ala Leu Thr Ala Leu lie Thr Pro Lys Lys lie Lys 
145 150 155 160 

Pro Pro Leu Pro ser Val Arg Lys Leu Thr Glu Asp Arg Trp Asn Lys 
165 170 175 
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Pro Gin Lys Thr Lys Gly His Arg Gly Ser His Thr Met Asn Gly His 
180 185 190 

<210> 64 
<211> 192 
<212> PRT 

<213> Artificial sequence 



<220> 

<223> Least squares and minimum of means reconstruction of clade B vif 
protein sequence 

<400> 64 

Met Glu Asn Arg Trp Gin val Met lie val Trp Gin Val Asp Arg Met 
15 10 15 

Arg lie Arg Thr Trp Lys Ser Leu val Lys His His Met Tyr lie ser 
20 25 30 

Arg Lys Ala Lys Gly Trp Phe Tyr Arg His His Tyr Glu Ser Thr His 
35 40 45 

pro Arg lie ser ser Glu val His lie Pro Leu Gly Asp Ala Arg Leu 
50 55 60 

Val lie Thr Thr Tyr Trp Gly Leu His Thr Gly Glu Arg Asp Trp His 
65 70 75 80 

Leu Gly Gin Gly Val Ser lie Glu Trp Arg Lys Lys Arg Tyr Ser Thr 
85 90 95 

Gin Val Asp Pro Asp Leu Ala Asp Gin Leu lie His Leu Tyr Tyr Phe 
100 105 110 

Asp Cys Phe Ser Glu ser Ala lie Arg Asn Ala lie Leu Gly His lie 
115 120 125 

Val ser pro Arg Cys Glu Tyr Gin Ala Gly His Asn Lys val Gly Ser 
130 135 140 

Leu Gin Tyr Leu Ala Leu Ala Ala Leu lie Thr Pro Lys Lys lie Lys 
145 150 155 160 

Pro Pro Leu Pro ser val Thr Lys Leu Thr Glu Asp Arg Trp Asn Lys 
165 170 175 

Pro Gin Lys Thr Lys Gly His Arg Gly ser His Thr Met Asn Gly His 
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180 185 190 

<210> 65 
<211> 96 
<212> PRT 

<213> Artificial sequence 
<220> 

<223> Most recent common ancestor reconstruction of clade B vpr protein 
sequence 

<400> 65 

Met Glu Gin Ala Pro Glu Asp Gin Gly Pro Gin Arg Glu Pro Tyr Asn 
15 10 15 

Glu Trp Thr Leu Glu Leu Leu Glu Glu Leu Lys Ser Glu Ala val Arg 
20 25 30 

His Phe Pro Arg Leu Trp Leu His ser Leu Gly Gin His lie Tyr Glu 
35 40 45 

Thr Tyr Gly Asp Thr Trp Ala Gly Val Glu Ala lie lie Arg lie Leu 
50 55 60 

Gin Gin Leu Leu Phe lie His Phe Arg lie Gly cys Gin His Ser Arg 
65 70 75 80 

lie Gly lie Thr Arg Gin Arg Arg Ala Arg Asn Gly Ala Ser Arg ser 
85 90 95 

<210> 66 
<211> 96 
<212> PRT 

<213> Artificial sequence 



<220> 

<223> Least squares and minimum of means reconstruction of clade b vpr 
protein sequence 

<400> 66 

Met Glu Gin Ala Pro Glu Asp Gin Gly Pro Gin Arg Glu Pro Tyr Asn 
15 10 15 

Glu Trp Thr Leu Glu Leu Leu Glu Glu Leu Lys Ser Glu Ala val Arg 
20 25 30 
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His Phe Pro Arg lie Trp Leu His Ser Leu Gly Gin His lie xyr Glu 
35 40 45 

Thr Tyr Gly Asp Thr Trp Ala Gly val Glu Ala lie lie Arg lie Leu 
50 55 60 

Gin Gin Leu Leu Phe lie His Phe Arg lie Gly Cys Arg His Ser Arg 
65 70 75 80 

lie Gly lie Thr Arg Gin Arg Arg Ala Arg Asn Gly Ala Ser Arg ser 
85 90 95 

<210> 67 
<211> 96 
<212> PRT 

<213> Artificial sequence 



<220> 

<223> Minimum of means reconstructions for the clade B of vpr protein s 
equence 

<400> 67 

Met Glu Gin Ala Pro Glu Asp Gin Gly Pro Gin Arg Glu Pro Tyr Asn 
15 10 15 

Glu Trp Thr Leu Glu Leu Leu Glu Glu Leu Lys ser Glu Ala val Arg 
20 25 30 

His Phe Pro Arg lie Trp Leu His ser Leu Gly Gin His lie Tyr Glu 
35 40 45 

Thr Tyr Gly Asp Thr Trp Ala Gly Val Glu Ala lie lie Arg lie Leu 
50 55 60 

Gin Gin Leu Leu Phe lie His Phe Arg lie Gly Cys Gin His ser Arg 
65 70 75 80 

lie Gly lie Thr Arg Gin Arg Arg Ala Arg Asn Gly Ala Ser Arg ser 
85 90 95 

<210> 68 
<211> 81 
<212> PRT 

<213> Artificial sequence 
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<220> 

<223> Most recent common ancestor reconstruction of clade B rev protein 
sequence 

<400> 68 

Met Gin Pro Leu Glu He Leu Ala lie val Ala Leu val Val Ala Ala 
15 10 15 

lie Leu Ala lie val val Trp Thr lie val Phe lie Glu Tyr Arg Lys 
20 25 30 

lie Leu Arg Gin Arg Lys lie Asp Arg Leu lie Asp Arg lie Arg Glu 
35 40 45 

Arg Ala Glu Asp Ser Gly Asn Glu Ser Glu Gly Asp Gin Glu Glu Leu 
50 55 60 

Ser Ala Leu val Glu Met Gly His His Ala Pro Trp Asp val Asp Asp 
65 70 75 80 

Leu 



<210> 69 

<211> 81 

<212> PRT 

<213> Artificial sequence 
<220> 

<223> Least squares and minimum of means reconstructions for the clade 
B vpu protein sequences 

<400> 69 

Met Gin Pro Leu Gin lie Leu Ala lie val Ala Leu val val Ala Ala 
15 10 15 

lie lie Ala lie val val Trp Thr lie val Phe lie Glu Tyr Arg Lys 
20 25 30 

lie Leu Arg Gin Arg Lys lie Asp Arg Leu lie Asp Arg lie Arg Glu 
35 40 45 

Arg Ala Glu Asp ser Gly Asn Glu Ser Glu Gly Asp Gin Glu Glu Leu 
50 55 60 

Ser Ala Leu val Glu Met Gly His His Ala Pro Trp Asp Val Asp Asp 
65 70 75 80 
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<210> 70 

<211> 1479 

<212> DNA 

<213> Artificial sequence 

<220> 

<223> Most recent common ancestor reconstruction of clade C gag protein 
sequence 



<400> 70 
atgggtgcga 


gagcgtcaat 


attaagaggg 


ggaaaattag 


atacatggga 


aaaaattagg 


60 


ttaaggccag 


ggggaaagaa 


acattatatg 


ataaaacacc 


tagtatgggc 


aagcagggag 


120 


ctggaaagat 


ttgcacttaa 


ccctggcctt 


ttagagacat 


cagaaggctg 


taaacaaata 


180 


ataaaacagc 


tacaaccagc 


tcttcagaca 


ggaacagagg 


aacttaaatc 


attatataac 


240 


acagtagcaa 


ctctctattg 


tgtacatcaa 


aggatagagg 


tacgagacac 


caaggaagcc 


300 


ttagacaaga 


tagaggaaga 


acaaaacaaa 


agtcagcaaa 


aaacacagca 


ggcagaagcg 


360 


gctgacggaa 


aggtcagtca 


aaattatcct 


atagtgcaga 


atctccaagg 


gcaaatggta 


420 


caccaggcca 


tatcacctag 


aactttgaat 


gcatgggtaa 


aagtaataga 


ggagaaggct 


480 


ttcagcccag 


aggtaatacc 


catgtttaca 


gcattatcag 


aaggagccac 


cccacaagat 


540 


ttaaacacca 


tgttaaatac 


agtgggggga 


catcaagcag 


ccatgcaaat 


gttaaaagat 


600 


accatcaatg 


aggaggctgc 


agaatgggat 


aggttacatc 


cagtgcatgc 


agggcctgtt 


660 


gcaccaggcc 


aaatgagaga 


accaagggga 


agtgacatag 


caggaactac 


tagtaccctt 


720 


caggaacaaa 


tagcatggat 


gacaagtaac 


ccacctatcc 


cagtgggaga 


catctataaa 


780 


agatggataa 


ttctggggtt 


aaataaaata 


gtaagaatgt 


atagccctgt 


cagcattttg 


840 


gacataaaac 


aagggccaaa 


ggaacccttt 


agagactatg 


tagaccggtt 


ctttaaaact 


900 


ttaagagctg 


aacaagctac 


acaagatgta 


aaaaattgga 


tgacagacac 


cttgttggtc 


960 


caaaatgcga 


acccagattg 


taagaccatt 


ttaagagcat 


taggaccagg 


ggctacacta 


1020 


gaagaaatga 


tgacagcatg 


tcagggagtg 


ggaggaccta 


gccataaagc 


aagagttttg 


1080 


gctgaggcaa 


tgagccaagc 


aaacaataca 


aacataatga 


tgcagagagg 


caattttaag 


1140 


ggccctagaa 


gaattgttaa 


atgtttcaac 


tgtggcaagg 


aaggacacat 


agccagaaat 


1200 


tgcagggccc 


ctaggaaaaa 


gggctgttgg 


aaatgtggaa 


aggaaggaca 


ccaaatgaaa 


1260 


gactgtactg 


agaggcaggc 


taatttttta 


gggaaaattt 


ggccttccca 


caaggggagg 


1320 


ccagggaatt 


tccttcagag 


cagaccagag 


ccaacagccc 


caccagcaga 


gagcttcagg 


1380 


ttcgaggaga 


caacccccgc 


tccgaagcag 


gagccgaaag 


acagggaacc 


cttaacttcc 


1440 
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ctcaaatcac tctttggcag cgaccccttg tctcaataa 



1479 



<210> 71 

<211> 1479 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Least squares center minumum of means reconstructions for dade C 
gag gene 



<400> 71 
atgggtgcga 


gagcgtcaat 


attaagaggc 


ggaaaattag 


atacatggga 


aaaaattagg 


60 


ttaaggccag 


ggggaaagaa 


acattatatg 


ctaaaacacc 


tagtatgggc 


aagcagggag 


120 


ctggaaagat 


ttgcacttaa 


ccctggcctt 


ttagagacat 


cagaaggctg 


taaacaaata 


180 


atgaaacagc 


tacaaccagc 


tcttcagaca 


ggaacagagg 


aacttagatc 


attatataac 


240 


acagtagcaa 


ctctctattg 


tgtacatgaa 


aagatagagg 


tacgagacac 


caaggaagcc 


300 


ttagacaaga 


tagaggaaga 


acaaaacaaa 


agtcagcaaa 


aaacacagca 


ggcagaagcg 


360 


gctgacggaa 


aggtcagtca 


aaattatcct 


atagtgcaga 


atctccaagg 


gcaaatggta 


420 


caccaggcca 


tatcacctag 


aactttgaat 


gcatgggtaa 


aagtaataga 


ggagaaggct 


480 


ttcagcccag 


aggtaatacc 


catgtttaca 


gcattatcag 


aaggagccac 


cccacaagat 


540 


ttaaacacca 


tgttaaatac 


agtgggggga 


catcaagcag 


ccatgcaaat 


gttaaaagat 


600 


accatcaatg 


aggaggctgc 


agaatgggat 


aggttacatc 


cagtacatgc 


agQgcctgtt 


660 


gcaccaggcc 


aaatgagaga 


accaagggga 


agtgacatag 


caggaactac 


tagtaccctt 


720 


caggaacaaa 


tagcatggat 


gacaagtaac 


ccacctgttc 


cagtgggaga 


catctataaa 


780 


agatggataa 


ttctggggtt 


aaataaaata 


gtaagaatgt 


atagccctgt 


cagcattttg 


840 


gacataaaac 


aagggccaaa 


ggaacccttt 


agagactatg 


tagaccggtt 


ctttaaaact 


900 


ttaagagctg 


aacaagctac 


acaagatgta 


aaaaattgga 


tgacagacac 


cttgttggtc 


960 


caaaatgcga 


acccagattg 


taagaccatt 


ttaagagcat 


taggaccagg 


ggctacatta 


1020 


gaagaaatga 


tgacagcatg 


tcagggagtg 


ggaggacctg 


gccacaaagc 


aagagtgttg 


1080 


gctgaggcaa 


tgagccaagc 


aaacaataca 


aacataatga 


tgcagagaag 


caattttaaa 


1140 


ggccctaaaa 


gaattgttaa 


atgtttcaac 


tgtggcaagg 


aagggcacat 


agccagaaat 


1200 


tgcagggccc 


ctaggaaaaa 


aggctgttgg 


aaatgtggaa 


aggaaggaca 


ccaaatgaaa 


1260 


gactgtactg 


agaggcaggc 


taatttttta 


gggaaaattt 


ggccttccca 


caaggggagg 


1320 


ccagggaatt 


tccttcagag 


cagaccagag 


ccaacagccc 


caccagcaga 


gagcttcagg 


1380 


ttcgaggaga 


caacccccgc 


tccgaagcag 


gagccgaaag 


acagggaacc 


cttaacttcc 


1440 


ctcaaatcac 


tctttggcag 


cgaccccttg 


tctcaataa 
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<210> 72 

<211> 1482 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Minimum of means reconstructions for dade C gag gene 



<400> 72 
atgggtgcga 


gagcgtcaat 


attaagaggc 


ggaaaattag 


atacatggga 


aaaaattagg 


60 


ttaaggccag 


ggggaaagaa 


acattatatg 


ctaaaacacc 


tagtatgggc 


aagcagggag 


120 


ctggaaagat 


ttgcacttaa 


ccctggcctt 


ttagagacat 


cagaaggctg 


taaacaaata 


180 


atgaaacagc 


tacaaccagc 


tcttcagaca 


ggaacagagg 


aacttagatc 


attatataac 


240 


acagtagcaa 


ctctctattg 


tgtacatgaa 


aagatagagg 


tacgagacac 


caaggaagcc 


300 


ttagacaaga 


tagaggaaga 


acaaaacaaa 


agtcagcaaa 


aaacacagca 


ggcagaagcg 


360 


gctgctgacg 


gaaaggtcag 


tcaaaattat 


cctatagtgc 


agaatctcca 


agggcaaatg 


420 


gtacaccagg 


ccatatcacc 


tagaactttg 


aatgcatggg 


taaaagtaat 


agaggagaag 


480 


gctttcagcc 


cagaggtaat 


acccatgttt 


acagcattat 


cagaaggagc 


caccccacaa 


540 


gatttaaaca 


ccatgttaaa 


tacagtgggg 


ggacatcaag 


cagccatgca 


aatgttaaaa 


600 


gataccatca 


atgaggaggc 


tgcagaatgg 


gataggttac 


atccagtaca 


tgcagggcct 


660 


gttgcaccag 


gccaaatgag 


agaaccaagg 


ggaagtgaca 


tagcaggaac 


tactagtacc 


720 


cttcaggaac 


aaatagcatg 


gatgacaagt 


aacccacctg 


ttccagtggg 


agacatctat 


780 


aaaagatgga 


taattctggg 


gttaaataaa 


atagtaagaa 


tgtatagccc 


tgtcagcatt 


840 


ttggacataa 


aacaagggcc 


aaaggaaccc 


tttagagact 


atgtagaccg 


gttctttaaa 


900 


actttaagag 


ctgaacaagc 


tacacaagat 


gtaaaaaatt 


ggatgacaga 


caccttgttg 


960 


gtccaaaatg 


cgaacccaga 


ttgtaagacc 


attttaagag 


cattaggacc 


aggggctaca 


1020 


ttagaagaaa 


tgatgacagc 


atgtcaggga 


gtgggaggac 


ctggccacaa 


agcaagagtg 


1080 


ttggctgagg 


caatgagcca 


agcaaacaat 


acaaacataa 


tgatgcagag 


aagcaatttt 


1140 


aaaggcccta 


aaagaattgt 


taaatgtttc 


aactgtggca 


aggaagggca 


catagccaga 


1200 


aattgcaggg 


cccctaggaa 


aaaaggctgt 


tggaaatgtg 


gaaaggaagg 


acaccaaatg 


1260 


aaagactgta 


ctgagaggca 


ggctaatttt 


ttagggaaaa 


tttggccttc 


ccacaagggg 


1320 


aggccaggga 


atttccttca 


gagcagacca 


gagccaacag 


ccccaccagc 


agagagcttc 


1380 


aggttcgagg 


agacaacccc 


cgctccgaag 


caggagccga 


aagacaggga 


acccttaact 


1440 


tccctcaaat 


cactctttgg 


cagcgacccc 


ttgtctcaat 


aa 




1482 



Page 77 



16336-13-2.ST25.txt 

<210> 73 

<211> 2547 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Most recent common ancestor reconstructions of clade C env gene 



<400> 73 
atgagagtga 


tggggataca 


gaggaattgt 


caacaatggt 


ggatatgggg 


catcttaggc 


60 


ttttggatgt 


taatgatttg 


tagtgtggtg 


gggaacttgt 


gggtcacagt 


ctattatggg 


120 


gtacctgtgt 


ggaaagaagc 


aaaaactact 


ctattttgtg 


catcagatgc 


taaagcatat 


180 


gagagagaag 


tgcataatgt 


ctgggctaca 


catgcctgtg 


tacccacaga 


ccccaaccca 


240 


caagaaatgg 


ttttggaaaa 


tgtaacagaa 


aattttaaca 


tgtggaaaaa 


tgacatggtg 


300 


gatcagatgc 


atgaggatat 


aatcagttta 


tgggatcaaa 


gcctaaagcc 


atgtgtaaag 


360 


ttgaccccac 


tctgtgtcac 


tttaaactgt 


actaatgtta 


ataatactaa 


taataccaat 


420 


agtaccatga 


atggagaaat 


gaaaaattgc 


tctttcaata 


taaccacaga 


aataagagat 


480 


aagaagaaga 


aagaatatgc 


acttttttat 


agacttgata 


tagtaccact 


taatgaaaat 


540 


aataacaata 


ctagtgaata 


tagattaata 


aattgtaata 


cctcagccat 


aacacaagcc 


600 


tgtccaaagg 


tctcttttga 


cccaattcct 


atacattatt 


gtgctccagc 


tggttatgcg 


660 


attctaaagt 


gtaataataa 


gacattcaat 


ggaacaggac 


catgcaaaaa 


tgtcagcaca 


720 


gtacaatgta 


cacatggaat 


taagccagtg 


gtatcaactc 


aactactgtt 


aaatggtagt 


780 


ctagcagaag 


aagagataat 


aattagatct 


gaaaatctga 


caaacaatgc 


caaaacaata 


840 


atagtacagc 


ttaatgaatc 


tgtagaaatt 


gtgtgtacaa 


gacccaacaa 


taatacaaga 


900 


aaaagtatga 


ggataggacc 


aggacaaaca 


ttctatgcaa 


caggagacat 


aataggagat 


960 


ataagacaag 


cacattgtaa 


cattagtgga 


agggaatgga 


ataacacttt 


acaacaggta 


1020 


gctgaaaaat 


taagaaaaca 


cttccctaat 


aaaacaataa 


aatttgcacc 


atcctcagga 


1080 


ggggacctag 


aaattacaac 


acatagcttt 


aattgtagag 


gagaattttt 


ctattgcaat 


1140 


acatcaaaac 


tgtttaatag 


tacatacaat 


agtacaaata 


gtacaaattc 


aaccatcaca 


1200 


ctcccatgca 


gaataaaaca 


aattataaac 


atgtggcagg 


gggtaggaca 


agcaatgtat 


1260 


gcccctccca 


ttgcaggaaa 


cataacatgt 


aaatcaaata 


tcacaggact 


actattgaca 


1320 


cgtgatggag 


gaaaaaatga 


aactaatgaa 


actgagacat 


tcagacctgg 


aggaggagat 


1380 


atgagggaca 


attggagaag 


tgaattatat 


aaatataaag 


tagtagaaat 


taaaccatta 


1440 


ggagtagcac 


ccactaaggc 


aaaaaggaga 


gtggtggaga 


gagaaaaaag 


agcagtggga 


1500 


ctaggagctg 


tgttccttgg 


gttcttggga 


gcagcaggaa 


gcactatggg 


cgcagcgtca 


1560 


ataacgctga 


cggtacaggc 


cagacaatta 


ttgtctggta tagtgcaaca 
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ttgctgaggg 


ctatagaggc 


gcaacagcat 


atgttgcaac 


tcacagtctg 


gggcattaag 


1680 


cagctccagg 


caagagtcct 


ggctatggaa 


agatacctaa 


aggatcaaca gctcctaggg 


1740 


atttggggct 


gctctggaaa 


actcatctgc 


accactgctg 




f tr* +" a fi +* ^ ri n 
L. 1.1m L.ciy L. Lyy 


1800 


agtaataaat 


ctcaagatga 


tatttgggat 


aacatgacct 


ggaxgyagxy 


nn3'^3/13n3 3 

y y ciL ciy ay cia. 


1860 


attaacaatt 


acacagacac 


aatatacagg 


ttgcttgaag 


del X C y C d a aci 


r*r*a/i/"3nn3 3 
i.v.ciy ^dyydd 


1920 


aaaaatgaac 


aagatttatt 


ggcattggac 


agttgggaaa 


auc Ly Lyyaa 


LT.yy uxx^yac 


1980 


atatcaaatt 


ggctgtggta 


tataaaaata 


ttcataatga 


Lagnaggagg 


c Lxga Ldgy L 


2040 


ttaagaataa 


tttttgctgt 


gctttctata 


gtaaatagag 


traggcaggg 


dLacucacc u 


2100 


ttgtcgtttc 


agacccttac 


cccaaacccg 


aggggacccg 


acaggcLcga 


ddgdaxcgaa 


2160 


gaagaaggtg 


gagagcaaga 


cagagacaga 


tccattcgat 


tagtgagcgg 


attcttagca 


2220 


cttgcctggg 


acgacctgcg 


gagcctgtgc 


ctcttcagct 


accaccgctt 


gagagacttc 


2280 


atcttgattg 


cagcgaggac 


tgtggaactt 


ctgggacgca 


gcagtctcag 


gggactacag 


2340 


agggggtggg 


aagcccttaa 


atatctggga 


agtcttgtgc 


agtattgggg 


tcaggagcta 


2400 


aaaaagagtg 


ctattagtct 


gcttgatacc 


atagcaatag 


cagtagctga 


agggacagat 


2460 


aggattatag 


aagtagtaca 


aagagcttgt 


agagctatcc 


tcaacatacc 


tagaagaata 


2520 


agacagggct 


ttgaagcagc 


tttgcaa 








2547 



<210> 74 

<211> 2547 

<212> DNA 

<213> Artificial sequence 
<220> 



<223> Least squares and minimum of means reconstructions for dade C en 
V gene 



<400> 74 
atgagagtga 


gggggatact 


gaggaattgt 


caacaatggt 


ggatatgggg 


catcttaggc 


60 


ttttggatgt 


taatgatttg 


taatgtggtg 


gggaacttgt 


gggtcacagt 


ctattatggg 


120 


gtacctgtgt 


ggaaagaagc 


aaaaactact 


ctattctgtg 


catcagatgc 


taaagcatat 


180 


gagaaagaag 


tgcataatgt 


ctgggctaca 


catgcctgtg 


tacccacaga 


ccccaaccca 


240 


caagaaatgg 


ttttggaaaa 


tgtaacagaa 


aattttaaca 


tgtggaaaaa 


tgacatggtg 


300 


gatcagatgc 


atgaggatat 


aatcagttta 


tgggatcaaa 


gcctaaagcc 


atgtgtaaag 


360 


ttgaccccac 


tctgtgtcac 


tttaaattgt 


agtaatgtta 


atgctaccaa 


tactaccaat 


420 


aataccatga 


agggagaaat 


aaaaaattgc 


tctttcaatg 


caaccacaga 


aataagagat 


480 


aagaaacaga 


aagtgtatgc 


acttttttat 


agacttgata 


tagtaccact 


taatgagaat 


540 
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aatagcaatt 


ctagtgagta 


tagattaata 


aattgtaata 


cctcagccat 


aacacaagcc 


600 


tgtccaaagg 


tctcttttga 


cccaattcct 


atacattatt 


gtgctccagc 


tggttatgcg 


660 


attctaaagt 


gtaataataa 


gacattcaat 


ggaacaggac 


catgcaataa 


tgtcagcaca 


720 


gtacaatgta 


cacatggaat 


taagccagtg 


gtatcaactc 


aactactgtt 


aaatggtagc 


780 


ctagcagaag 


aagagataat 


aattagatct 


gaaaatctga 


caaacaatgt 


caaaacaata 


840 


atagtacatc 


ttaatgaatc 


tgtagaaatt 


gtgtgtacaa 


gacccaacaa 


taatacaaga 


900 


aaaagtataa 


ggataggacc 


aggacaaaca 


ttctatgcaa 


caggagacat 


aataggagac 


960 


ataagacaag 


cacattgtaa 


cattagtgaa 


gaggaatgga 


ataaaacttt 


acaaagggta 


1020 


ggtaaaaaat 


tagaagaaca 


cttccctaat 


aaaacaataa 


aatttgaacc 


atcctcagga 


1080 


ggggacctag 


aaattacaac 


acatagcttt 


aattgtagag 


gagaattttt 


ctattgcaat 


1140 


acatcaaaac 


tgtttaatag 


tacatacaat 


ggtacaaata 


gtacaaatac 


aaccatcaca 


1200 


ctcccatgca 


gaataaaaca 


aattataaac 


atgtggcagg 


aggtaggacg 


agcaatgtat 


1260 


gcccctccca 


ttgcaggaaa 


cataacatgt 


aaatcaaata 


tcacaggact 


actattggta 


1320 


cgtgatggag 


gaaaaaataa 


cacaaataac 


acagagatat 


tcagacctgg 


aggaggagat 


1380 


atgagggaca 


attggagaag 


tgaattatat 


aaatataaag 


tggtagaaat 


taagccattg 


1440 


ggaatagcac 


ccactaaggc 


aaaaaggaga 


gtggtggaga 


gagaaaaaag 


agcagtggga 


1500 


ataggagctg 


tgttccttgg 


gttcttggga 


gcagcaggaa 


gcactatggg 


cgcggcgtca 


1560 


ataacgctga 


cggtacaggc 


cagacaattg 


ttgtctggta 


tagtgcaaca 


gcaaagcaat 


1620 


ttgctgaggg 


ctatagaggc 


gcaacagcat 


atgttgcaac 


tcacggtctg 


gggcattaag 


1680 


cagctccaga 


caagagtcct 


ggctatagaa 


agatacctaa 


aggatcaaca 


gctcctaggg 


1740 


atttggggct 


gctctggaaa 


actcatctgc 


accactgctg 


tgccttggaa 


ctctagttgg 


1800 


agtaataaat 


ctcaagaaga 


tatttgggat 


aacatgacct 


ggatgcagtg 


ggatagagaa 


1860 


attagtaatt 


acacagacac 


aatatacagg 


ttgcttgaag 


actcgcaaaa 


ccagcaggaa 


1920 


caaaatgaaa 


aagatttact 


agcattggac 


agttggaaaa 


atctgtggaa 


ttggtttgac 


1980 


ataacaaatt 


ggctgtggta 


tataaaaata 


ttcataatga 


tagtaggagg 


cttgataggt 


2040 


ttaagaataa 


tttttgctgt 


gctttctata 


gtgaatagag 


ttaggcaggg 


atactcacct 


2100 


ttgtcgtttc 


agacccttac 


cccaaacccg 


aggggacccg 


acaggctcgg 


aagaatcgaa 


2160 


gaagaaggtg 


gagagcaaga 


cagagacaga 


tccattcgat 


tagtgagcgg 


attcttagca 


2220 


cttgcctggg 


acgacctgcg 


gagcctgtgc 


ctcttcagct 


accaccgatt 


gagagacttc 


2280 


atattggtgg 


cagcgagagc 


ggtggaactt 


ctgggacgca 


qcagtctcag 


gggactacag 


2340 


agggggtggg 


aagcccttaa 


gtatctggga 


agtcttgtgc 


agtattgggg 


tctggagcta 


2400 


aaaaagagtg 


ctattagtct 


gcttgatacc 


atagcaatag 


cagtagctga 


aggaacagat 


2460 


aggattatag 


aattaataca 


aagaatttgt 


agagctatcc 


gcaacatacc 


tagaagaata 


2520 


agacagggct 


ttgaagcagc 


tttgcaa 








2547 
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<210> 75 
<211> 2550 
<212> DNA 

<213> Artificial sequence 



<220> 

<223> Minimum of means reconstructions for dade C env gene 

<400> 75 



atgagagtga 


gggggatact 


gaggaattgt 


caacaatggt 


ggatatgggg 


catcttaggc 


60 


ttttggatgt 


taatgatttg 


taatgtggtg 


gggaacttgt 


gggtcacagt 


ctattatggg 


120 


gtacctgtgt 


ggaaagaagc 


aaaaactact 


ctattctgtg 


catcagatgc 


taaagcatat 


180 


gagaaagaag 


tgcataatgt 


ctgggctaca 


catgcctgtg 


tacccacaga 


ccccaaccca 


240 


caagaaatgg 


ttttggaaaa 


tgtaacagaa 


aattttaaca 


tgtggaaaaa 


tgacatggtg 


300 


gatcagatgc 


atgaggatat 


aatcagttta 


tgggatcaaa 


gcctaaagcc 


atgtgtaaag 


360 


ttgaccccac 


tctgtgtcac 


tttaaattgt 


agtaatgtta 


atactaccaa 


tactaccaat 


420 


aataccatga 


aaggagaaat 


aaaaaattgc 


tctttcaatg 


taaccacaga 


actaagagat 


480 


aagaaaaaga 


aagagtatgc 


acttttttat 


agacttgata 


tagtaccact 


taatgagaat 


540 


aataacaatt 


ctagtgagta 


tagattaata 


aattgtaata 


cctcagccat 


aacacaagcc 


600 


tgtccaaagg 


tctcttttga 


cccaattcct 


atacattatt 


gtgctccagc 


tggttatgcg 


660 


attctaaagt 


gtaataataa 


gacattcaat 


ggaacaggac 


catgcaataa^ tgtcagcaca 


720 


gtacaatgta 


cacatggaat 


taagccagtg 


gtatcaactc 


aactactgtt 


aaatggtagc 


780 


ctagcagaag 


aagagataat 


aattagatct 


gaaaatctga 


caaacaatgc 


caaaacaata 


840 


atagtacatc 


ttaatgaatc 


tgtagaaatt 


gtgtgtacaa 


gacccaacaa 


taatacaaga 


900 


aaaagtataa 


ggataggacc 


aggacaaaca 


ttctatgcaa 


caggagacat 


aataggagac 


960 


ataagacaag 


cacattgtaa 


cattagtgaa 


gaggaatgga 


ataaaacttt 


acaaagggta 


1020 


ggtaaaaaat 


tagaagaaca 


cttccctaat 


aaaacaataa 


aatttgaacc 


atcctcagga 


1080 


ggggacctag 


aaattacaac 


acatagcttt 


aattgtagag 


gagaattttt 


ctattgcaat 


1140 


acatcaaaac 


tgtttaatag 


tacatacaat 


ggtacaaata 


gtacaaattc 


aaccatcaca 


1200 


ctccaatgca 


gaataaaaca 


aattataaac 


atgtggcagg 


aggtaggacg 


agcaatgtat 


1260 


gcccctccca 


ttgcaggaaa 


cataacatgt 


aaatcaaata 


tcacaggact 


actattggta 


1320 


cgtgatggag 


gaaaaaatga 


cacaaatgac 


acagagatat 


tcagacctgg 


aggaggagat 


1380 


atgagggaca 


attggagaag 


tgaattatat 


aaatataaag 


tggtagaaat 


taagccattg 


1440 


ggaatagcac 


ccactaaggc 


aaaaaggaga 


gtggtggaga 


gagaaaaaag 


agcagtggga 


1500 


ataggagctg 


tgttccttgg 


gttcttggga 


gcagcaggaa 


gcactatggg 


cgcagcgtca 


1560 
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ataacgctga 


cggtacaggc 


cagacaattg 


ttgtctggta 


tagtgcaaca 


gcaaagcaat 


1620 


ttgctgaggg 


ctatagaggc 


gcaacagcat 


atgttgcaac 


tcacggtctg 


gggcattaag 


1680 


cagctccaga 


caagagtcct 


ggctatagaa 


agatacctaa 


aggatcaaca 


gctcctaggg 


1740 


atttggggct 


gctctggaaa 


actcatctgc 


accactgctg 


tgccttggaa 


ctctagttgg 


1800 


agtaataaat 


ctcaagagga 


tatttgggat 


aacatgacct 


ggatgcagtg 


ggatagagaa 


1860 


attagtaatt 


acacagacac 


aatatacagg 


ttgcttgaag 


actcgcaaaa 


ccagcaggaa 


1920 


caaaatgaaa 


aagatttact 


agcattggac 


agttggaaaa 


atctgtggaa 


ttggtttgac 


1980 


ataacaaatt 


ggctgtggta 


tataaaaata 


ttcataatga 


tagtaggagg 


cttgataggt 


2040 


ttaagaataa 


tttttgctgt 


gctttctata 


gtgaatagag 


ttaggcaggg 


atactcacct 


2100 


ttgtcgtttc 


agacccttac 


cccaaacccg 


aggggacccg 


acaggctcgg 


aagaatcgaa 


2160 


gaagaaggtg 


gagagcaaga 


cagagacaga 


tccattcgat 


tagtgagcgg 


attcttagca 


2220 


cttgcctggg 


acgacctgcg 


gagcctgtgc 


ctcttcagct 


accaccgatt 


gagagacttc 


2280 


atattggtgg 


cagcgagagc 


ggtggaactt 


ctgggacgca 


gcagtctcag 


gggactacag 


2340 


agggggtggg 


aagcccttaa 


gtatctggga 


agtcttgtgc 


agtattgggg 


tctggagcta 


2400 


aaaaagagtg 


ctattagtct 


gcttgatacc 


atagcaatag 


cagtagctga 


aggaacagat 


2460 


aggattatag 


aattaataca 


aagaatttgt 


agagctatcc 


gcaacatacc 


tagaagaata 


2520 


agacagggct 


ttgaagcagc 


tttgcaataa 








2550 



<210> 76 
<211> 618 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Most recent common ancestor reconstructions of clade C nef gene 
<400> 76 

atggggggca agtggtcaaa aagcagtata gttggatggc ctgctgtaag agaaagaata 60 
agacgaactg ctccagcagc agaaggagta ggagcagcgt ctcaagactt agataaacat 120 
ggagcactta caagcagcaa cacagccgcc actaatgctg attgtgcctg gctggaagca 180 
caagaggagg aagaagtagg ctttccagtc agacctcagg tgcctttaag accaatgact 240 
tataagggag cagtcgatct cagcttcttt ttaaaagaaa aggggggact ggaagggtta 300 
atttactcta agaaaaggca agagatcctt gatttgtggg tctatcacac acaaggctac 360 
ttccctgatt ggcaaaacta cacaccggga ccagggatca gatttccact gacctttgga 420 
tggtgcttca agctagtgcc agttgaccca agggaagtag aagaggccaa tgaaggagag 480 
aacaactgct tgctacaccc tatgagccag catggaatgg aggatgaaga cagagaagta 540 
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ttaaagtgga agtttgacag tcacctagca cgcagacaca tggcccgcga gctacatccg 600 

gagtattaca aagactgc 618 

<210> 77 

<211> 624 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Least squares and minimum of means reconstructions of dade C nef 
gene 



<400> 77 
atggggggca 


agtggtcaaa 


aagcagtata 


gttggatggc 


ctgctgtaag 


agaaagaata 


60 


agacgaactg 


agccagcagc 


agagggagta 


ggagcagcgt 


ctcaagactt 


agataaacat 


120 


ggagcactta 


caagcagcaa 


cacagccgcc 


aataatgctg 


attgtgcctg 


gctggaagca 


180 


caagaggagg 


aagaagaagt 


aggctttcca 


gtcagacctc 


aggtgccttt 


aagaccaatg 


240 


acttataagg 


gagcattcga 


tctcagcttc 


tttttaaaag 


aaaagggggg 


actggaaggg 


300 


ttaatttact 


ctaagaaaag 


gcaagagatc 


cttgatttgt 


gggtctatca 


cacacaaggc 


360 


tacttccctg 


attggcaaaa 


ctacacaccg 


ggaccagggg 


tcagatatcc 


actgaccttt 


420 


ggatggtgct 


tcaagctagt 


gccagttgac 


ccaagggaag 


tagaagaggc 


caacgaagga 


480 


gagaacaact 


gtttgctaca 


ccctatgagc 


cagcatggaa 


tggaggatga 


agacagagaa 


540 


gtattaaagt 


ggaagtttga 


cagtcaccta 


gcacgcagac 


acatggcccg 


cgagctacat 


600 


ccggagtatt 


acaaagactg 


ctga 








624 



<210> 78 
<211> 3000 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Most recent common ancestor reconstructions of clade C pol 
<400> 78 

ttttttaggg aaaatttggc cttcccacaa ggggaggcca gggaatttcc ttcagagcag 

accagagcca acagccccac cagcagagag cttcaggttc gaggagacaa cccccgctcc 

gaagcaggag ccgaaagaca gggaaccctt aacttccctc aaatcactct ttggcagcga 

ccccttgtct caataaaagt agggggccag ataaaggaag ctctattaga tacaggagca 

gatgatacag tattagaaga cataaatttg ccaggaaaat ggaaaccaaa aatgataggg 
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ggaattggag 


gttttatcaa 


agtaagacag 


tatgatcaaa 


tacttataga 


aatttgtgga 


360 


aaaaaggcta 


taggtacagt 


attagtagga 


cctacacctg 


tcaacataat 


tggaagaaat 


420 


atgttgactc 


agcttggttg 


cactctaaat 


tttccaatta 


gtcctattga 


aactgtacca 


480 


gtaaaattaa 


agccaggaat 


ggatggccca 


aaggttaaac 


aatggccatt 


gacagaagag 


540 


aaaataaaag 


cattaacagc 


aatttgtgaa 


gaaatggaaa 


aggaaggaaa 


aattacaaaa 


600 


attgggcctg 


aaaatccata 


taacactcca 


gtatttgcca 


taaaaaagaa 


ggacagtact 


660 


aagtggagaa 


aattagtaga 


tttcagagaa 


ctcaataaaa 


gaactcaaga 


cttctgggaa 


720 


gttcaattag 


gaataccaca 


cccagcaggg 


ttaaaaaaga 


aaaaatcagt 


aacagtactg 


780 


gatgtggggg 


atgcatattt 


ttcagttcct 


ttagatgaag 


acttcaggaa 


atatactgca 


840 


ttcaccatac 


ctagtataaa 


caatgaaaca 


ccagggatta 


gatatcaata 


taatgtgctt 


900 


ccacagggat 


ggaaaggatc 


accagcaata 


ttccagagta 


gcatgacaaa 


aatcttagag 


960 


ccctttaggg 


cacaaaaccc 


agaaatagtt 


atctatcaat 


acatggatga 


cttgtatgta 


1020 


ggatctgact 


tagaaatagg 


gcaacataga 


gcaaaaatag 


aggagttaag 


agaacatcta 


1080 


ttgaaatggg 


gatttaccac 


accagacaag 


aaacatcaga 


aagaaccccc 


atttctttgg 


1140 


atggggtatg 


aactccatcc 


tgacaaatgg 


acagtacagc 


ctatacagct 


gccagaaaag 


1200 


gatagctgga 


ctgtcaatga 


tatacagaag 


ttagtgggaa 


aattaaactg 


ggcaagtcag 


1260 


atttacccag 


ggattaaagt 


aaggcaactg 


tgtaaactcc 


ttaggggagc 


caaagcacta 


1320 


acagacatag 


taccactgac 


tgaagaagca 


gaattagaat 


tggcagagaa 


cagggaaatt 


1380 


ctaaaagaac 


cagtacatgg 


agtatattat 


gacccatcaa 


aagacttaat 


agctgaaata 


1440 


cagaaacagg 


ggcatgacca 


atggacatat 


caaatttacc 


aagaaccatt 


caaaaatctg 


1500 


aaaacaggaa 


agtatgcaaa 


aatgaggtct 


gcccacacta 


atgatgtaaa 


acaattaaca 


1560 


gaagcagtgc 


aaaaaatagc 


catggaaagc 


atagtaatat 


ggggaaagac 


tcctaaattt 


1620 


agactaccca 


tccaaaaaga 


aacatgggag 


acatggtgga 


cagactattg 


gcaagccacc 


1680 


tggattcctg 


agtgggagtt 


tgttaatacc 


cctcccctag 


taaaattatg 


gtaccagcta 


1740 


gaaaaagaac 


ccatagcagg 


agcagaaact 


ttctatgtag 


atggggcagc 


taatagggaa 


1800 


actaaactag 


gaaaagcagg 


gtatgttact 


gacaaaggaa 


gacagaaagt 


tgtttctcta 


1860 


actgaaacaa 


caaatcagaa 


gactgaatta 


caagcaattc 


agctagcttt 


gcaggattca 


1920 


ggatcagaag 


taaacatagt 


aacagactca 


caatatgcat 


taggaatcat 


tcaagcacaa 


1980 


ccagataaga 


gtgaatcaga 


gttagtcaat 


caaataatag 


agcagttaat 


aaaaaaggaa 


2040 


aaggtctacc 


tgtcatgggt 


accagcacat 


aaaggaattg 


gaggaaatga 


acaagtagat 


2100 


aaattagtaa 


gttctggaat 


caggaaagtg 


ctgtttctag 


atggaataga 


taaagctcaa 


2160 


gaagaacatg 


aaaaatatca 


cagcaattgg 


agagcaatgg 


ctagtgagtt 


taatctgcca 


2220 


cccatagtag 


caaaagaaat 


agtagctagc 


tgtgataaat 


gtcagctaaa 


aggggaagcc 


2280 


atgcatggac 


aagtagactg 


tagtccaggg 


atatggcaat tagattgtac 
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ggaaaagtta 


tcctggtagc 


agtccatgta 


gccagtggct 


acatagaagc 


agaagttatc 


2400 


ccagcagaaa 


caggacagga 


aacagcatac 


tttatattaa 


aattagcagg 


aagatggcca 


2460 


gtaaaagtaa 


tacatacaga 


caatggcagc 


aatttcacca 


gtgctgcagt 


taaggcagcc 


2520 


tgttggtggg 


caggtatcca 


acaggaattt 


ggaattccct 


acaatcccca 


aagtcaggga 


2580 


gtagtagaat 


ccatgaataa 


agaattaaag 


aaaatcatag 


ggcaggtaag 


agatcaagct 


2640 


gagcacctta 


agacagcagt 


acaaatggca 


gtattcattc 


acaattttaa 


aagaaaaggg 


2700 


gggattgggg 


ggtacagtgc 


aggggaaaga 


ataatagaca 


taatagcaac 


agacatacaa 


2760 


actaaagaat 


tacaaaaaca 


aattataaaa 


attcaaaatt 


ttcgggttta 


ttacagagac 


2820 


agcagagacc 


ctgtttggaa 


aggaccagcc 


aaactactct 


ggaaaggtga 


aggggcagta 


2880 


gtaatacaag 


acaatagtga 


cataaaggta 


gtaccaagga 


ggaaagcaaa 


gatcattagg 


2940 


gattatggaa 


aacagatggc 


aggtgctgat 


tgtgtggcag 


gtagacagga 


tgaagattag 


3000 



<210> 79 

<211> 2999 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Least squares reconstruction for dade c pol gene 



<400> 79 
ttttttaggg 


aaaatttggc 


cttcccacaa 


ggggaggcca 


gggaatttcc 


ttcagagcag 


60 


accagagcca 


acagccccac 


cagcagagag 


cttcaggttc 


gaggagacaa 


cccccgctcc 


120 


gaagcaggag 


ccgaaagaca 


gggaaccctt 


aacttccctc 


aaatcactct 


ttggcagcga 


180 


ccccttgtct 


caataaaagt 


agggggccag 


ataaaggagg 


ctctcttaga 


cacaggagca 


240 


gatgatacag 


tattagaaga 


aataaatttg 


ccaggaaaat 


ggaaaccaaa 


aatgatagga 


300 


ggaattggag 


gttttatcaa 


agtaagacag 


tatgatcaaa 


tacttataga 


aatttgtgga 


360 


aaaaaggcta 


taggtacagt 


attagtagga 


cctacacctg 


tcaacataat 


tggaagaaat 


420 


atgttgactc 


agcttggatg 


cacactaaat 


tttccaatta 


gtcccattga 


aactgtacca 


480 


gtaaaattaa 


agccaggaat 


ggatggccca 


aaggttaaac 


aatggccatt 


gacagaagag 


540 


aaaataaaac 


attaacagca 


atttgtgaag 


aaatggagaa 


ggaaggaaaa 


attacaaaaa 


600 


ttgggcctga 


aaatccatat 


aacactccag 


tatttgccat 


aaaaaagaag 


gacagtacta 


660 


agtggagaaa 


attagtagat 


ttcagggaac 


tcaataaaag 


aactcaagac 


ttttgggaag 


720 


ttcaattagg 


aataccacac 


ccagcagggt 


taaaaaagaa 


aaaatcagtg 


acagtactgg 


780 


atgtggggga 


tgcatatttt 


tcagttcctt 


tagatgaagg 


cttcaggaaa 


tatactgcat 


840 


tcaccatacc 


tagtataaac 


aatgaaacac 


cagggattag atatcaatat 
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cacagggatg 


gaaaggatca 


ccagcaatat 


tccagagtag 


catgacaaaa 


atcttagagc 


960 


cctttagggc 


acaaaatcca 


gaaatagtca 


tctatcaata 


tatggatgac 


ttgtatgtag 


1020 


gatctgactt 


agaaataggg 


caacatagag 


caaaaataga 


ggagttaaga 


gaacatctat 


1080 


taaagtgggg 


atttaccaca 


ccagacaaga 


aacatcagaa 


agaaccccca 


tttctttgga 


1140 


tggggtatga 


actccatcct 


gacaaatgga 


cagtacagcc 


tatacagctg 


ccagaaaagg 


1200 


atagctggac 


tgtcaatgat 


atacagaagt 


tagtgggaaa 


attaaactgg 


gcaagtcaga 


1260 


tttacccagg 


gattaaagta 


aggcaacttt 


gtaaactcct 


taggggggcc 


aaagcactaa 


1320 


cagacatagt 


accactaact 


gaagaagcag 


aattagaatt 


ggcagagaac 


agggaaattc 


1380 


taaaagaacc 


agtacatgga 


gtatattatg 


acccatcaaa 


agacttgata 


gctgaaatac 


1440 


agaaacaggg 


gcatgaccaa 


tggacatatc 


aaatttacca 


agaaccattc 


aaaaatctga 


1500 


aaacagggaa 


gtatgcaaaa 


atgaggactg 


cccacactaa 


tgatgtaaaa 


cagttaacag 


1560 


aggcagtgca 


aaaaatagcc 


atggaaagca 


tagtaatatg 


gggaaagact 


cctaaattta 


1620 


gactacccat 


ccaaaaagaa 


acatgggaga 


catggtggac 


agactattgg 


caagccacct 


1680 


ggattcctga 


gtgggagttt 


gttaataccc 


ctcccctagt 


aaaattatgg 


taccagctgg 


1740 


agaaagaacc 


catagcagga 


gcagaaactt 


tctatgtaga 


tggagcagct 


aatagggaaa 


1800 


ctaaaatagg 


aaaagcaggg 


tatgttactg 


acagaggaag 


gcagaaaatt 


gtttctctaa 


1860 


ctgaaacaac 


aaatcagaag 


actgaattac 


aagcaattca 


gctagctttg 


caagattcag 


1920 


gatcagaagt 


aaacatagta 


acagactcac 


agtatgcatt 


aggaatcatt 


caagcacaac 


1980 


cagataagag 


tgaatcagag 


ttagtcaacc 


aaataataga 


acaattaata 


aaaaaggaaa 


2040 


gggtctacct 


gtcatgggta 


ccagcacata 


aaggaattgg 


aggaaatgaa 


caagtagata 


2100 


aattagtaag 


tagtggaatc 


aggaaagtgc 


tgtttctaga 


tggaatagat 


aaggctcaag 


2160 


aagagcatga 


aaagtatcac 


agcaattgga 


gagcaatggc 


tagtgagttt 


aatctgccac 


2220 


ccatagtagc 


aaaagaaata 


gtagctagct 


gtgataaatg 


tcagctaaaa 


ggggaagcca 


2280 


tacatggaca 


agtagactgt 


agtccaggga 


tatggcaatt 


agattgtaca 


catttagaag 


2340 


gaaaaatcat 


cctggtagca 


gtccatgtag 


ccagtggcta 


catagaagca 


gaggttatcc 


2400 


cagcagaaac 


aggacaagaa 


acagcatact 


ttatactaaa 


attagcagga 


agatggccag 


2460 


tcaaagtaat 


acatacagac 


aatggcagta 


atttcaccag 


tgctgcagtt 


aaggcagcct 


2520 


gttggtgggc 


aggtatccaa 


caggaatttg 


gaattcccta 


caatccccaa 


agtcagggag 


2580 


tagtagaatc 


catgaataaa 


gaattaaaga 


aaatcatagg 


gcaggtaaga 


gatcaagctg 


2640 


agcaccttaa 


gacagcagta 


caaatggcag 


tattcattca 


caattttaaa 


agaaaagggg 


2700 


ggattggggg 


gtacagtgca 


ggggaaagaa 


taatagacat 


aatagcaaca 


gacatacaaa 


2760 


ctaaagaatt 


acaaaaacaa 


attataaaaa 


ttcaaaattt 


tcgggtttat 


tacagagaca 


2820 


gcagagaccc 


tatttggaaa 


ggaccagcca 


aactactctg 


gaaaggtgaa 


ggggcagtag 


2880 


taatacaaga 


taatagtgac 


ataaaggtag 


taccaaggag gaaagcaaaa 
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atcattaagg 
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actatggaaa acagatggca ggtgctgatt gtgtggcagg tagacaggat gaagattag 



2999 



<210> 80 

<211> 3000 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Minimum of means reconstructions of dade c pol gene 



<400> 80 
ttttttaggg 


aaaatttggc 


cttcccacaa 


ggggaggcca 


gggaatttcc 


ttcagagcag 


60 


accagagcca 


acagccccac 


cagcagagag 


cttcaggttc 


gaggagacaa 


ccccctctcc 


120 


gaagcaggag 


ccgaaagaca 


gggaaccctt 


aacttccctc 


aaatcactct 


ttggcagcga 


180 


ccccttgtca 


caataaaagt 


agggggacag 


ctaaaggagg 


ctctcttaga 


cacaggagca 


240 


gatgatacag 


tattagaaga 


aataaatttg 


ccaggaaaat 


ggaaaccaaa 


aatgatagga 


300 


ggaattggag 


gttttatcaa 


agtaagacag 


tatgatcaaa 


tacttataga 


aatttgtgga 


360 


aaaaaggcta 


taggtacagt 


actagtagga 


cctacacctg 


tcaacataat 


tggaagaaat 


420 


atgttgactc 


agcttggatg 


cacactaaat 


tttccaatta 


gtcccattga 


aactgtacca 


480 


gtaaaattaa 


agccaggaat 


ggatggccca 


aaggtcaaac 


aatggccatt 


gacagaagag 


540 


aaaataaaag 


cattaacagc 


aatttgtgaa 


gaaatggaga 


aggaaggaaa 


aattacaaaa 


600 


attgggcctg 


aaaatccata 


taacactcca 


gtatttgcca 


taaaaaagaa 


ggacagtact 


660 


aagtggagaa 


aattagtaga 


tttcagggaa 


ctcaataaaa 


gaactcaaga 


cttttgggaa 


720 


gttcaattag 


ggataccaca 


cccagcaggg 


ttaaaaaaga 


aaaaatcagt 


gacagtactg 


780 


gatgtggggg 


atgcatattt 


ttcagttcct 


ttagatgaag 


gcttcaggaa 


atatactgca 


840 


ttcaccatac 


ctagtataaa 


caatgaaaca 


ccagggatta 


gatatcaata 


taatgtgctt 


900 


ccacagggat 


ggaaaggatc 


accagcaata 


ttccagagta 


gcatgacaaa 


aatcttagag 


960 


ccctttaggg 


cacaaaatcc 


agaaatagtt 


atctatcaat 


atatggatga 


cttgtatgta 


1020 


ggatctgact 


tagaaatagg 


gcaacataga 


gcaaaaatag 


aggagttaag 


agaacatcta 


1080 


ttgaagtggg 


gatttaccac 


accagacaag 


aaacatcaga 


aagaaccccc 


atttctttgg 


1140 


atggggtatg 


aactccatcc 


tgacaaatgg 


acagtacagc 


ctatacagct 


gccagaaaag 


1200 


gatagctgga 


ctgtcaatga 


tatacagaag 


ttagtgggaa 


aattaaactg 


ggcaagtcag 


1260 


atttacccag 


ggattaaagt 


aaggcaactg 


tgtaaactcc 


ttaggggagc 


caaagcacta 


1320 


acagacatag 


taccactaac 


tgaagaagca 


gaattagaat 


tggcagagaa 


cagggaaatt 


1380 


ctaaaagaac 


cagtacatgg 


agtatattat 


gacccatcaa 


aagacttaat 


agctgaaata 


1440 


cagaaacagg 


ggcatgacca 


atggacatat 


caaatttacc aagaaccatt 
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1500 
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aaaacaggga 


agtatgcaaa 


aatgaggact 


gcccacacta 


atgatgtaaa 


acagttaaca 


1560 


gaggcagtgc 


aaaaaatagc 


catggaaagc 


atagtaatat 


ggggaaagac 


tcctaaattt 


1620 


agattaccca 


tccagaaaga 


aacatgggag 


gcatggtgga 


cagactattg 


gcaagccacc 


1680 


tggattcctg 


agtgggagtt 


tgttaatacc 


cctcccctag 


taaaattatg 


gtaccagctg 


1740 


gagaaagaac 


ccatagcagg 


agcagaaact 


ttctatgtag 


atggagcagc 


taatagggaa 


1800 


actaaaatag 


gaaaagcagg 


gtatgttact 


gacagaggaa 


ggcagaaaat 


tgtttctcta 


1860 


actgaaacaa 


caaatcagaa 


gactgaatta 


caagcaattc 


agctagcttt 


gcaggattca 


1920 


ggatcagaag 


taaacatagt 


aacagactca 


cagtatgcat 


taggaatcat 


tcaagcacaa 


1980 


ccagataaga 


gtgaatcaga 


gttagtcaat 


caaataatag 


aacagttaat 


aaaaaaggaa 


2040 


agggtctacc 


tgtcatgggt 


accagcacat 


aaaggaattg 


gaggaaatga 


acaagtagat 


2100 


aaattagtaa 


gtagtggaat 


caggaaagtg 


ctgtttctag 


atggaataga 


taaggctcaa 


2160 


gaagagcatg 


aaaaatatca 


cagcaattgg 


agagcaatgg 


ctagtgagtt 


taatctgcca 


2220 


cccatagtag 


caaaagaaat 


agtagctagc 


tgtgataaat 


gtcagctaaa 


aggggaagcc 


2280 


atacatggac 


aagtagactg 


tagtccaggg 


atatggcaat 


tagattgtac 


acatttagaa 


2340 


ggaaaaatca 


tcctggtagc 


agtccatgta 


gccagtggct 


acatagaagc 


agaggttatc 


2400 


ccagcagaaa 


caggacaaga 


aacagcatac 


tttatactaa 


aattagcagg 


aagatggcca 


2460 


gtcaaagtaa 


tacatacaga 


caatggcagt 


aatttcacca 


gtgctgcagt 


taaagcagcc 


2520 


tgttggtggg 


caggtatcca 


acaggaattt 


ggaattccct 


acaatcccca 


aagtcaggga 


2580 


gtagtagaat 


ccatgaataa 


agaattaaag 


aaaatcatag 


ggcaggtaag 


agatcaagct 


2640 


gagcacctta 


agacagcagt 


acaaatggca 


gtattcattc 


acaattttaa 


aagaaaaggg 


2700 


gggattgggg 


ggtacagtgc 


aggggaaaga 


ataatagaca 


taatagcaac 


agacatacaa 


2760 


actaaagaat 


tacaaaaaca 


aattataaaa 


attcaaaatt 


ttcgggttta 


ttacagagac 


2820 


agcagagacc 


ctatttggaa 


aggaccagcc 


aaactactct 


ggaaaggtga 


aggggcagta 


2880 


gtaatacaag 


ataacagtga 


cataaaggta 


gtaccaagga 


ggaaagcaaa 


aatcattaag 


2940 


gactatggaa 


aacagatggc 


aggtgctgat 


tgtgtggcag 


gtagacagga 


tgaagattag 


3000 



<210> 81 
<211> 381 
<212> DNA 

<213> Artificial sequence 

<220> 

<223> Most common recent ancestor reconstructions of clade c rev gene 
<400> 81 

atggcaggaa gaagcggaga cagcgacgaa gcgctcctcc aagcagtgag gatcatcaaa 60 
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atcctatatc aaagcaaccc ttaccccaaa cccgagggga cccgacaggc tcgaaggaat 120 

cgaagaagaa ggtggagagc aagacagaga cagatccatt cgattagtga gcggattctt 180 

agcacttgcc tgggacgacc tgcggagcct gtgcctcttc agctaccacc gcttgagaga 240 

cttcatcttg attgcagcga ggactgtgga acttctggga cgcagcagtc tcaggggact 300 

acagaggggg tgggaagccc ttaaatatct gggaagcctt gtgcagtatt ggggtcagga 360 

gctaaaaaag agtgctatta g 381 

<210> 82 

<211> 381 

<212> DNA 

<213> Artificial sequence 



<220> 

<223> Least squares of reconstructions of clade c rev gene 
<400> 82 

atggcaggaa gaagcggaga cagcgacgaa gcgctcctcc aagcagtgag gatcatcaaa 60 

atcttatatc aaagcaaccc ttaccccaaa cccgagggga cccgacaggc tcggaagaat 120 

cgaagaagaa ggtggagagc aagacagaga cagatccatt cgattagtga gcggattctt 180 

agcacttgcc tgggacgacc tgcggagcct gtgcctcttc agctaccacc gattgagaga 240 

cttcatattg gtgacagcga gagcagtgga acttctggga cgcagcagtc tcaggggact 300 

acagaggggg tgggaagccc ttaagtatct gggaagtctt gtgcagtatt ggggtctgga 360 

actaaaaaag agtgctatta g 381 

<210> 83 

<211> 381 

<212> DNA 

<213> Artificial sequence 



<220> 

<223> Minimum of means reconstruction of clade c rev gene 
<400> 83 

atggcaggaa gaagcggaga cagcgacgaa gcgctcctcc aagcagtgag gatcatcaaa 60 

atcctatatc aaagcaaccc ttaccccaaa cccgagggga cccgacaggc tcggaagaat 120 

cgaagaagaa ggtggagagc aagacagaga cagatccatt cgattagtga gcggattctt 180 

agcacttgcc tgggacgacc tgcggagcct gtgccttttc agctaccacc gattgagaga 240 

cttcatattg gtgacagcga gagcagtgga acttctggga cgcagcagtc tcaggggact 300 
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acagaggggg tgggaagccc ttaagtatct gggaagtctt gtgcagtatt ggggtctgga 
actaaaaaag agtgctatta g 



360 
381 



<210> 84 

<211> 306 

<212> DNA 

<213> Artificial sequence 



<220> 

<223> Most recent common ancestor reconstruction of clade C tat gene se 
quence 

<400> 84 

atggagccag tagatcctaa cctagagccc tggaaccatc caggaagtca gcctaaaact 60 

gcttgtaata aatgttattg taaaaaatgt agctatcatt gtctagtttg ctttctgaca 120 

aaaggcttag gcatttccta tggcaggaag aagcggagac agcgacgaag agctcctcca 180 

agcagtgagg atcatcaaaa tcctatatca aagcaaccct tatcccaaac ccgaggggac 240 

ccgacaggct cggaggaatc gaagaagaag gtggagagca agacagagac agatccgtgc 300 

gattag 306 

<210> 85 

<211> 306 

<212> DNA 

<213> Artificial sequence 



<220> 

<223> Least squares of reconstruction of clade C tat gene sequence 
<400> 85 

atggagccag tagatcctaa cctagagccc tggaaccatc caggaagtca gcctaaaact 60 

ccttgtaata agtgttattg taaacactgt agctatcatt gtctagtttg ctttcagaca 120 

aaaggcttag gcatttccta tggcaggaag aagcggagac agcgacgaag cgctcctcca 180 

agcagtgagg atcatcaaaa tcctatatca aagcaaccct taccccaaac ccgaggggac 240 

ccgacaggct cggaagaatc gaagaagaag gtggagagca agacagagac agatccattc 300 

gattag 306 



<210> 86 
<211> 306 
<212> DNA 
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<213> Artificial sequence 



<220> 

<223> Minimum of means reconstructions of clade C tat gene sequence 

<400> 86 

atggagccag tagatcctaa cctagagccc tggaaccatc caggaagtca gcctaaaact 60 

ccttgtaata agtgttattg taaacactgt agctatcatt gtctagtttg ctttcagaca 120 

aaaggcttag gcatttccta tggcaggaag aagcggagac agcgacgaag cgctcctcca 180 

agcagtgagg atcatcaaaa tcctatatca aagcaaccct taccccaaac ccgaggggac 240 

ccgacaggct cggaggaatc gaagaagaag gtggagagca agacagagac agatccattc 300 

gattag 306 

<210> 87 

<211> 579 

<212> DNA 

<213> Artificial sequence 



<220> 



<223> Most recent common ancestor reconstructions of clade C vif gene s 
equence 



<400> 87 
atggaaaaca 


gatggcaggt 


gctgattgtg 


tggcaggtag 


acaggatgaa 


gattagaaca 


60 


tggaatagtt 


tagtaaaaca 


ccatatgtat 


gtttcaagga 


gagctaaagg 


atggttttat 


120 


agacatcact 


atgaaagcag 


acatccaaaa 


ataagttcag 


aagtacacat 


cccattaggg 


180 


gatgctagat 


tagtaataaa 


aacatattgg 


ggtttgcata 


caggagaaag 


agattggcat 


240 


ttgggtcatg 


gagtctccat 


agaatggaga 


ctgagaagat 


atagcacaca 


agtagaccct 


300 


ggcctggcag 


accaactaat 


tcatatgcat 


tattttgatt 


gttttgcaga 


ctctgccata 


360 


aggaaagcca 


tattaggaca 


tatagttagc 


cctaggtgtg 


actatcaagc 


aggacataac 


420 


aaggtaggat 


ctctacaata 


cttggcactg 


acagcattaa 


taaaaccaaa 


aaagataaag 


480 


ccacctctgc 


ctagtgttaa 


gaaattagta 


gaggatagat 


ggaacaagcc 


ccagaagacc 


540 


aggggccaca 


gagggagcca 


tacaatgaat 


ggacactag 






579 



<210> 88 

<211> 579 

<212> DNA 

<213> Artificial sequence 
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<220> 

<223> Least squares of reconstruction of clade C vif gene sequence 



<400> 88 
atggaaaaca 


gatggcaggt 


gctgattgtg 


tggcaggtag 


acaggatgaa 


gattagaaca 


60 


tggaatagtt 


tagtaaagca 


ccatatgtat 


gtttcaagga gagctaatgg 


atggttttac 


120 


agacatcatt 


atgaaagcag 


acatccaaaa 


gtaagttcag 


aagtacacat 


cccattaggg 


180 


gatgctagat 


tagtaataaa 


aacatattgg 


ggtttgcaaa 


caggagaaag 


agattggcat 


240 


ttgggtcatg 


gagtctccat 


agaatggaga 


ttgagaagat 


atagcacaca 


agtagaccct 


300 


ggcctggcag 


accagctaat 


tcatatgcat 


tattttgatt 


gttttgcaga 


ctctgccata 


360 


agaaaagcca 


tattaggaca 


catagttatt 


cctaggtgtg 


actatcaagc 


aggacataat 


420 


aaggtaggat 


ctctacaata 


cttggcactg 


acagcattga 


taaaaccaaa 


aaagataaag 


480 


ccacctctgc 


ctagtgttag 


gaaattagta 


gaggatagat 


ggaacaagcc 


ccagaagacc 


540 


aggggccgca 


gagggaacca 


tacaatgaat 


ggacactag 






579 



<210> 89 

<211> 579 

<212> DNA 

<213> Artificial sequence 
<220> 



<223> Minimum of means center of tree reconstruction of clade C vif gen 
e sequence 



<400> 89 
atggaaaaca 


gatggcaggt 


gctgattgtg 


tggcaggtag 


acaggatgaa 


gattagaaca 


60 


tggaatagtt 


tagtaaagca 


ccatatgtat 


gtttcaagga 


gagctaatgg 


atggttttac 


120 


agacatcatt 


atgaaagcag 


acatccaaaa 


gtaagttcag 


aagtacacat 


cccattaggg 


180 


gatgctagat 


tagtaataaa 


aacatattgg 


ggtttgcata 


caggagaaag 


agattggcat 


240 


ttgggtcatg 


gagtctccat 


agaatggaga 


ttgagaagat 


atagcacaca 


agtagaccct 


300 


ggcctggcag 


accagctaat 


tcatatgcat 


tattttgatt 


gttttgcaga 


ctctgccata 


360 


agaaaagcca 


tattaggaca 


catagttatt 


cctaggtgtg 


actatcaagc 


aggacataat 


420 


aaggtaggat 


ctctacaata 


cttggcactg 


acagcattga 


taaaaccaaa 


aaagataaag 


480 


ccacctctgc 


ctagtgttag 


gaaattagta 


gaggatagat 


ggaacaagcc 


ccagaagacc 


540 


aggggccgca 


gagggaacca 


tacaatgaat 


ggacactag 






579 



<210> 90 
<211> 291 
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<212> 
<213> 
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DNA 

Artificial sequence 



<220> 

<223> Most recent common ancestor reconstructions for dade C vpr gene 
sequence 

<400> 90 

atggaacaag ccccagaaga ccaggggcca cagagggagc catacaatga atggacacta 60 

gagcttttag aggaacttaa gcaggaagct gtcagacatt ttcctagacc atggctccat 120 

agcttaggac aacatatcta tgaaacctat ggggatactt gggcgggagt tgaagctata 180 

ataagaattc tgcaacaact actgtttatt catttcagaa ttgggtgcca acatagcaga 240 

ataggcatta ttcgacagag aagagcaaga aatggagcca gtagatccta a 291 

<210> 91 

<211> 291 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Least squares center of tree reconstruction of clade C vpr gene s 
equence 

<400> 91 

atggaacaag ccccagaaga ccaggggccg cagagggaac catacaatga atggacacta 60 

gagattttag aggaactcaa gcaggaagct gtcagacact ttcctagacc atggctccat 120 

agcttaggac aatatatcta tgaaacctat ggggatactt ggacaggagt cgaagctcta 180 

ataagaatac tgcaacaact actgtttatt catttcagaa ttgggtgcca gcatagcaga 240 

ataggcattt tgcgacagag aagagcaaga aatggagcca gtagatccta a 291 

<210> 92 

<211> 291 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Minimum of means center of tree reconstruction of clade C vpr gen 
e sequence 

<400> 92 

atggaacaag ccccagaaga ccaggggccg cagagggaac catacaatga atggacacta 60 
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gagcttttag aggaactcaa gcaggaagct gtcagacact ttcctagacc atggctccat 120 

agcttaggac aacatatcta tgaaacctat ggggatactt ggacgggagt tgaagctcta 180 

ataagaattc tgcaacaact actgtttatt catttcagaa ttgggtgcca gcatagcaga 240 

ataggcatta tgcgacagag aagagcaaga aatggagcca gtagatccta a 291 

<210> 93 

<211> 261 

<212> DNA 

<213> Artificial sequence 



<220> 

<223> Most recent common ancestor reconstruction of dade C vpu gene se 
quence 

<400> 93 

atgttagatt taatagcaag agtagattat agattaggag taggagcatt gatagtagca 60 

ctaatcatag caatagttgt gtggaccata gtatatatag aatataggaa attggtaaga 120 

caaagaaaaa tagactggtt aattaaaaga attagggaaa gagcagaaga cagtggcaat 180 

gagagtgatg gggatacaga ggaattgtca acactggtgg atatggggca tcttaggctt 240 

ttggatgtta atgatttgta a 261 

<210> 94 

<211> 261 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Least squares center of tree reconstruction of clade C vpu gene s 
equence 

<400> 94 

atgttagatt tactagcaag agtagattat agattaggag taggagcatt gatagtagca 60 

ctaatcatag caatagttgt gtggaccata gtatatatag aatataggaa attgttaaga 120 

caaagaaaaa tagactggtt aattaaaaga attagggaaa gagcagaaga cagtggcaat 180 

gagagtgagg gggatactga ggaattgtca acaatggtgg atatggggca tcttaggctt 240 

ttggatgtta atgatttgta a 261 

<210> 95 
<211> 261 
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<212> DNA 

<213> Artificial sequence 



<220> 

<223> Minimum of means center of tree reconstruction of clade C vpu gen 
e sequence 

<400> 95 

atgttagatt tactagcaag agtagattat agattaggag taggagcatt gatagtagca 60 

ctaatcatag caatagttgt gtggaccata gtatatatag aatataggaa attgttaaga 120 

caaagaaaaa tagactggtt aattaaaaga attagggaaa gagcagaaga cagtggcaat 180 

gagagtgagg gggatactga ggaattatca acaatggtgg atatggggca tcttaggctt 240 

ttggatgtta atgatttgta a 261 

<210> 96 

<211> 492 

<212> PRT 

<213> Artificial sequence 



<220> 

<223> Most recent common ancestor reconstruction of clade C gag protein 
sequence 

<400> 96 

Met Gly Ala Arg Ala ser lie Leu Arg Gly Gly Lys Leu Asp Thr Trp 
15 10 15 

Glu Lys lie Arg Leu Arg Pro Gly Gly Lys Lys His Tyr Met lie Lys 
20 25 30 

His Leu val Trp Ala Ser Arg Glu Leu Glu Arg Phe Ala Leu Asn Pro 
35 40 45 

Gly Leu Leu Glu Thr Ser Glu Gly Cys Lys Gin lie lie Lys Gin Leu 
50 55 60 

Gin Pro Ala Leu Gin Thr Gly Thr Glu Glu Leu Lys ser Leu Tyr Asn 
65 70 75 80 

Thr val Ala Thr Leu Tyr cys val His Gin Arg lie Glu val Arg Asp 
85 90 95 

Thr Lys Glu Ala Leu Asp Lys lie Glu Glu Glu Gin Asn Lys Ser Gin 
100 105 110 
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Gin Lys Thr Gin Gin Ala Glu Ala Ala Asp Gly Lys Val Ser Gin Asn 
115 120 125 

Tyr pro lie val Gin Asn Leu Gin Gly Gin Met val His Gin Ala lie 
130 135 140 

ser Pro Arg Thr Leu Asn Ala Trp val Lys val lie Glu Glu Lys Ala 
145 150 155 160 

Phe Ser Pro Glu val lie Pro Met Phe Thr Ala Leu ser Glu Gly Ala 
165 170 175 

Thr Pro Gin Asp Leu Asn Thr Met Leu Asn Thr val Gly Gly His Gin 
180 185 190 

Ala Ala Met Gin Met Leu Lys Asp Thr lie Asn Glu Glu Ala Ala Glu 
195 200 205 

Trp Asp Arg Leu His Pro val His Ala Gly Pro val Ala Pro Gly Gin 
210 215 220 

Met Arg Glu Pro Arg Gly Ser Asp lie Ala Gly Thr Thr Ser Thr Leu 
225 230 235 240 

Gin Glu Gin lie Ala Trp Met Thr Ser Asn Pro Pro lie Pro val Gly 

245 250 255 

Asp lie Tyr Lys Arg Trp lie lie Leu Gly Leu Asn Lys lie val Arg 
260 265 270 

Met Tyr ser Pro val ser lie Leu Asp lie Lys Gin Gly Pro Lys Glu 
275 280 285 

Pro Phe Arg Asp Tyr val Asp Arg Phe Phe Lys Thr Leu Arg Ala Glu 
290 295 300 

Gin Ala Thr Gin Asp val Lys Asn Trp Met Thr Asp Thr Leu Leu val 
305 310 315 320 

Gin Asn Ala Asn Pro Asp Cys Lys Thr lie Leu Arg Ala Leu Gly Pro 
325 330 335 

Gly Ala Thr Leu Glu Glu Met Met Thr Ala Cys Gin Gly val Gly Gly 
340 345 350 

Pro Ser His Lys Ala Arg val Leu Ala Glu Ala Met Ser Gin Ala Asn 
355 360 365 

Asn Thr Asn lie Met Met Gin Arg Gly Asn Phe Lys Gly Pro Arg Arg 
370 375 380 
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lie Val Lys Cys Phe Asn Cys Gly Lys Glu Gly His lie Ala Arg Asn 
385 390 395 400 

Cys Arg Ala Pro Arg Lys Lys Gly Cys Trp Lys Cys Gly Lys Glu Gly 
405 410 415 

His Gin Met Lys Asp Cys Thr Glu Arg Gin Ala Asn Phe Leu Gly Lys 
420 425 430 

lie Trp Pro ser His Lys Gly Arg Pro Gly Asn Phe Leu Gin Ser Arg 
435 440 445 

Pro Glu Pro Thr Ala Pro Pro Ala Glu Ser Phe Arg Phe Glu Glu Thr 
450 455 460 

Thr Pro Ala Pro Lys Gin Glu Pro Lys Asp Arg Glu Pro Leu Thr Ser 
465 470 475 480 

Leu Lys Ser Leu Phe Gly Ser Asp Pro Leu ser Gin 
485 490 

<210> 97 
<211> 492 
<212> PRT 

<213> Artificial sequence 



<220> 

<223> Least squares center of tree reconstruction of clade C gag protei 
n sequence 

<400> 97 

Met Gly Ala Arg Ala Ser lie Leu Arg Gly Gly Lys Leu Asp Thr Trp 
15 10 15 

Glu Lys lie Arg Leu Arg Pro Gly Gly Lys Lys His Tyr Met Leu Lys 
20 25 30 

His Leu val Trp Ala ser Arg Glu Leu Glu Arg Phe Ala Leu Asn Pro 
35 40 45 

Gly Leu Leu Glu Thr ser Glu Gly Cys Lys Gin lie Met Lys Gin Leu 
50 55 60 

Gin Pro Ala Leu Gin Thr Gly Thr Glu Glu Leu Arg Ser Leu Tyr Asn 
65 70 75 80 

Thr val Ala Thr Leu Tyr Cys val His Glu Lys lie Glu val Arg Asp 
85 90 95 
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Thr Lys Glu Ala Leu Asp Lys lie Glu Glu Glu Gin Asn Lys ser Gin 
100 105 110 

Gin Lys Thr Gin Gin Ala Glu Ala Ala Asp Gly Lys val ser Gin Asn 
115 120 125 

Tyr pro lie val Gin Asn Leu Gin Gly Gin Met val His Gin Ala lie 
130 135 140 

Ser Pro Arg Thr Leu Asn Ala Trp Val Lys val lie Glu Glu Lys Ala 
145 150 155 160 

Phe Ser Pro Glu val lie Pro Met Phe Thr Ala Leu ser Glu Gly Ala 
165 170 175 

Thr Pro Gin Asp Leu Asn Thr Met Leu Asn Thr Val Gly Gly His Gin 
180 185 190 

Ala Ala Met Gin Met Leu Lys Asp Thr lie Asn Glu Glu Ala Ala Glu 
195 200 205 

Trp Asp Arg Leu His Pro val His Ala Gly Pro val Ala Pro Gly Gin 
210 215 220 

Met Arg Glu Pro Arg Gly Ser Asp lie Ala Gly Thr Thr Ser Thr Leu 
225 230 235 240 

Gin Glu Gin lie Ala Trp Met Thr Ser Asn Pro Pro val Pro val Gly 
245 250 255 

Asp lie Tyr Lys Arg Trp lie lie Leu Gly Leu Asn Lys lie Val Arg 
260 265 270 

Met Tyr Ser Pro val Ser lie Leu Asp lie Lys Gin Gly Pro Lys Glu 
275 280 285 

Pro Phe Arg Asp Tyr val Asp Arg Phe Phe Lys Thr Leu Arg Ala Glu 
290 295 300 

Gin Ala Thr Gin Asp val Lys Asn Trp Met Thr Asp Thr Leu Leu val 
305 310 315 320 

Gin Asn Ala Asn pro Asp Cys Lys Thr lie Leu Arg Ala Leu Gly Pro 

325 330 335 

Gly Ala Thr Leu Glu Glu Met Met Thr Ala Cys Gin Gly val Gly Gly 
340 345 350 

Pro Gly His Lys Ala Arg val Leu Ala Glu Ala Met Ser Gin Ala Asn 
355 360 365 
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Asn Thr Asn lie Met Met Gin Arg Ser Asn Phe Lys Gly Pro Lys Arg 
370 375 380 

lie val Lys cys Phe Asn cys Gly Lys Glu Gly His lie Ala Arg Asn 
385 390 395 400 

Cys Arg Ala Pro Arg Lys Lys Gly Cys Trp Lys Cys Gly Lys Glu Gly 
405 410 415 

His Gin Met Lys Asp Cys Thr Glu Arg Gin Ala Asn Phe Leu Gly Lys 
420 425 430 

lie Trp Pro ser His Lys Gly Arg Pro Gly Asn Phe Leu Gin Ser Arg 
435 440 445 

Pro Glu Pro Thr Ala Pro Pro Ala Glu Ser Phe Arg Phe Glu Glu Thr 
450 455 460 

Thr Pro Ala Pro Lys Gin Glu Pro Lys Asp Arg Glu Pro Leu Thr Ser 
465 470 475 480 

Leu Lys Ser Leu Phe Gly Ser Asp Pro Leu Ser Gin 
485 490 

<210> 98 
<211> 493 
<212> PRT 

<213> Artificial sequence 



<220> 

<223> Minimum of means center of tree reconstruction of clade C gag pro 
tein sequence 

<400> 98 

Met Gly Ala Arg Ala Ser lie Leu Arg Gly Gly Lys Leu Asp Thr Trp 
15 10 15 

Glu Lys lie Arg Leu Arg Pro Gly Gly Lys Lys His Tyr Met Leu Lys 
20 25 30 

His Leu Val Trp Ala Ser Arg Glu Leu Glu Arg Phe Ala Leu Asn Pro 
35 40 45 

Gly Leu Leu Glu Thr Ser Glu Gly cys Lys Gin lie Met Lys Gin Leu 
50 55 60 

Gin Pro Ala Leu Gin Thr Gly Thr Glu Glu Leu Arg ser Leu Tyr Asn 
65 70 75 80 
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Thr val Ala Thr Leu Tyr cys val His Glu Lys lie Glu val Arg Asp 
85 90 95 

Thr Lys Glu Ala Leu Asp Lys lie Glu Glu Glu Gin Asn Lys Ser Gin 
100 105 110 

Gin Lys Thr Gin Gin Ala Glu Ala Ala Ala Asp Gly Lys val Ser Gin 
115 120 125 

Asn Tyr Pro lie val Gin Asn Leu Gin Gly Gin Met val His Gin Ala 
130 135 140 

lie ser Pro Arg Thr Leu Asn Ala Trp val Lys val lie Glu Glu Lys 
145 150 155 160 

Ala Phe Ser Pro Glu val lie Pro Met Phe Thr Ala Leu ser Glu Gly 
165 170 175 

Ala Thr Pro Gin Asp Leu Asn Thr Met Leu Asn Thr val Gly Gly His 
180 185 190 

Gin Ala Ala Met Gin Met Leu Lys Asp Thr lie Asn Glu Glu Ala Ala 
195 200 205 

Glu Trp Asp Arg Leu His Pro val His Ala Gly Pro val Ala Pro Gly 
210 215 220 

Gin Met Arg Glu Pro Arg Gly Ser Asp lie Ala Gly Thr Thr Ser Thr 
225 230 235 240 

Leu Gin Glu Gin lie Ala Trp Met Thr Ser Asn Pro Pro val Pro val 
245 250 255 

Gly Asp lie Tyr Lys Arg Trp lie ile Leu Gly Leu Asn Lys lie val 
260 265 270 

Arg Met Tyr ser Pro val Ser lie Leu Asp lie Lys Gin Gly Pro Lys 
275 280 285 

Glu Pro Phe Arg Asp Tyr val Asp Arg Phe Phe Lys Thr Leu Arg Ala 
290 295 300 

Glu Gin Ala Thr Gin Asp val Lys Asn Trp Met Thr Asp Thr Leu Leu 
305 310 315 320 

Val Gin Asn Ala Asn Pro Asp Cys Lys Thr lie Leu Arg Ala Leu Gly 
325 330 335 

Pro Gly Ala Thr Leu Glu Glu Met Met Thr Ala cys Gin Gly val Gly 
340 345 350 
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Gly Pro Gly His Lys Ala Arg val Leu Ala Glu Ala Met Ser Gin Ala 
355 360 365 

Asn Asn Thr Asn lie Met Met Gin Arg ser Asn Phe Lys Gly Pro Lys 
370 375 380 

Arg lie val Lys Cys Phe Asn cys Gly Lys Glu Gly His lie Ala Arg 
385 390 395 400 

Asn cys Arg Ala Pro Arg Lys Lys Gly Cys Trp Lys Cys Gly Lys Glu 
405 410 415 

Gly His Gin Met Lys Asp cys Thr Glu Arg Gin Ala Asn Phe Leu Gly 
420 425 430 

Lys lie Trp Pro Ser His Lys Gly Arg Pro Gly Asn Phe Leu Gin Ser 
435 440 445 

Arg pro Glu Pro Thr Ala Pro Pro Ala Glu Ser Phe Arg Phe Glu Glu 
450 455 460 

Thr Thr Pro Ala Pro Lys Gin Glu Pro Lys Asp Arg Glu Pro Leu Thr 
465 470 475 480 

Ser Leu Lys ser Leu Phe Gly Ser Asp Pro Leu Ser Gin 
485 490 

<210> 99 
<211> 849 
<212> PRT 

<213> Artificial sequence 



<220> 

<223> Most recent common ancestor reconstruction of clade C gpl60 prote 
in sequence 

<400> 99 

Met Arg val Met Gly lie Gin Arg Asn Cys Gin Gin Trp Trp lie Trp 
15 10 15 

Gly lie Leu Gly Phe Trp Met Leu Met lie cys Ser val val Gly Asn 
20 25 30 

Leu Trp val Thr Val Tyr Tyr Gly val Pro Val Trp Lys Glu Ala Lys 
35 40 45 

Thr Thr Leu Phe Cys Ala Ser Asp Ala Lys Ala Tyr Glu Arg Glu val 
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50 55 60 

His Asn val Trp Ala Thr His Ala cys val Pro Thr Asp Pro Asn Pro 
65 70 75 80 

Gin Glu Met val Leu Glu Asn val Thr Glu Asn Phe Asn Met Trp Lys 
85 90 95 

Asn Asp Met Va1 Asp Gln Met His Glu Asp i1e lie Ser Leu Trp Asp 
100 105 110 

Gin Ser Leu Lys Pro Cys val Lys Leu Thr Pro Leu cys val Thr Leu 
115 120 125 

Asn Cys Thr Asn val Asn Asn Thr Asn Asn Thr Asn ser Thr Met Asn 
130 135 140 

Gly Glu Met Lys Asn Cys Ser Phe Asn lie Thr Thr Glu lie Arg Asp 
145 150 155 160 

Lys Lys Lys Lys Glu Tyr Ala Leu Phe Tyr Arg Leu Asp lie Val Pro 
165 170 175 

Leu Asn Glu Asn Asn Asn Asn Thr Ser Glu Tyr Arg Leu lie Asn Cys 
180 185 190 

Asn Thr Ser Ala lie Thr Gin Ala Cys Pro Lys Val Ser Phe Asp Pro 
195 200 205 

lie Pro lie His Tyr Cys Ala Pro Ala Gly Tyr Ala lie Leu Lys Cys 
210 215 220 

Asn Asn Lys Thr Phe Asn Gly Thr Gly Pro Cys Lys Asn Val Ser Thr 
225 230 235 240 

Val Gin Cys Thr His Gly lie Lys Pro Val val Ser Thr Gin Leu Leu 
245 250 255 

Leu Asn Gly ser Leu Ala Glu Glu Glu lie lie lie Arg ser Glu Asn 
260 265 270 

Leu Thr Asn Asn Ala Lys Thr lie lie val Gin Leu Asn Glu ser val 
275 280 285 

Glu lie val Cys Thr Arg Pro Asn Asn Asn Thr Arg Lys ser Met Arg 
290 295 300 

He Gly Pro Gly Gin Thr Phe Tyr Ala Thr Gly Asp He He Gly Asp 
305 310 315 320 

He Arg Gin Ala His Cys Asn lie Ser Gly Arg Glu Trp Asn Asn Thr 
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325 330 335 

Leu Gin Gin val Ala Glu Lys Leu Arg Lys His Phe Pro Asn Lys Thr 
340 345 350 

lie Lys Phe Ala Pro Ser Ser Gly Gly Asp Leu Glu He Thr Thr His 
355 360 365 

Ser Phe Asn Cys Arg Gly Glu Phe Phe Tyr Cys Asn Thr Ser Lys Leu 
370 375 380 

Phe Asn ser Thr Tyr Asn ser Thr Asn ser Thr Asn Ser Thr lie Thr 
385 390 395 400 

Leu Pro cys Arg ile Lys Gin lie lie Asn Met Trp Gin Gly val Gly 
405 410 415 

Gin Ala Met Tyr Ala Pro Pro Ile Ala Gly Asn Ile Thr cys Lys Ser 
420 425 430 

Asn lie Thr Gly Leu Leu Leu Thr Arg Asp Gly Gly Lys Asn Glu Thr 
435 440 445 

Asn Glu Thr Glu Thr Phe Arg Pro Gly Gly Gly Asp Met Arg Asp Asn 
450 455 460 

Trp Arg Ser Glu Leu Tyr Lys Tyr Lys val val Glu Ile Lys Pro Leu 
465 470 475 480 

Gly val Ala Pro Thr Lys Ala Lys Arg Arg val val Glu Arg Glu Lys 
485 490 495 

Arg Ala val Gly Leu Gly Ala val Phe Leu Gly Phe Leu Gly Ala Ala 
500 505 510 

Gly ser Thr Met Gly Ala Ala Ser lie Thr Leu Thr val Gin Ala Arg 
515 520 525 

Gin Leu Leu Ser Gly lie val Gin Gin Gin ser Asn Leu Leu Arg Ala 
530 535 540 

Ile Glu Ala Gin Gin His Met Leu Gin Leu Thr val Trp Gly lie Lys 
545 550 555 560 

Gin Leu Gin Ala Arg val Leu Ala Met Glu Arg Tyr Leu Lys Asp Gin 

565 570 575 

Gin Leu Leu Gly lie Trp Gly Cys Ser Gly Lys Leu lie Cys Thr Thr 
580 585 590 

Ala Val Pro Trp Asn Ser ser Trp ser Asn Lys ser Gin Asp Asp lie 
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595 600 605 

Trp Asp Asn Met Thr Trp Met Glu Trp Asp Arg Glu lie Asn Asn Tyr 
610 615 620 

Thr Asp Thr lie Tyr Arg Leu Leu Glu Glu Ser Gin Asn Gin Gin Glu 
625 630 635 640 

Lys Asn Glu Gin Asp Leu Leu Ala Leu Asp ser Trp Glu Asn Leu Trp 
645 650 655 

Asn Trp Phe Asp lie ser Asn Trp Leu Trp Tyr lie Lys ile Phe lie 
660 665 670 

Met Ile val Gly Gly Leu lie Gly Leu Arg Ile Ile Phe Ala Val Leu 
675 680 685 

Ser lie val Asn Arg val Arg Gin Gly Tyr ser Pro Leu ser Phe Gin 
690 695 700 

Thr Leu Thr Pro Asn Pro Arg Gly Pro Asp Arg Leu Glu Arg Ile Glu 
705 710 715 720 

Glu Glu Gly Gly Glu Gin Asp Arg Asp Arg ser lie Arg Leu val ser 
725 730 735 

Gly Phe Leu Ala Leu Ala Trp Asp Asp Leu Arg Ser Leu cys Leu Phe 
740 745 750 

ser Tyr His Arg Leu Arg Asp Phe lie Leu Ile Ala Ala Arg Thr val 
755 760 765 

Glu Leu Leu Gly Arg ser ser Leu Arg Gly Leu Gin Arg Gly Trp Glu 
770 775 780 

Ala Leu Lys Tyr Leu Gly ser Leu Val Gin Tyr Trp Gly Gin Glu Leu 
785 790 795 800 

Lys Lys Ser Ala lie ser Leu Leu Asp Thr lie Ala lie Ala val Ala 
805 810 815 

Glu Gly Thr Asp Arg lie lie Glu val val Gin Arg Ala Cys Arg Ala 
820 825 830 

Ile Leu Asn lie Pro Arg Arg Ile Arg Gin Gly Phe Glu Ala Ala Leu 
835 840 845 

Gin 



<210> 100 
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<211> 849 
<212> PRT 

<213> Artificial sequence 
<220> 

<223> Least squares of center of tree reconstructions of clade C gpl60 
protein sequence 

<400> 100 

Met Arg Val Arg Gly lie Leu Arg Asn Cys Gin Gin Trp Trp lie Trp 
15 10 15 

Gly lie Leu Gly Phe Trp Met Leu Met lie Cys Asn val Val Gly Asn 
20 25 30 

Leu Trp val Thr val Tyr Tyr Gly val Pro val Trp Lys Glu Ala Lys 
35 40 45 

Thr Thr Leu Phe Cys Ala Ser Asp Ala Lys Ala Tyr Glu Lys Glu val 
50 55 60 

His Asn val Trp Ala Thr His Ala Cys val Pro Thr Asp Pro Asn Pro 
65 70 75 80 

Gin Glu Met Val Leu Glu Asn Val Thr Glu Asn Phe Asn Met Trp Lys 
85 90 95 

Asn Asp Met val Asp Gin Met His Glu Asp lie lie ser Leu Trp Asp 
100 105 110 

Gin Ser Leu Lys Pro Cys Val Lys Leu Thr Pro Leu cys val Thr Leu 
115 120 125 

Asn Cys Ser Asn val Asn Ala Thr Asn Thr Thr Asn Asn Thr Met Lys 
130 135 140 

Gly Glu He Lys Asn cys Ser Phe Asn Ala Thr Thr Glu lie Arg Asp 
145 150 155 160 

Lys Lys Gin Lys val Tyr Ala Leu Phe Tyr Arg Leu Asp lie val Pro 
165 170 175 

Leu Asn Glu Asn Asn ser Asn Ser ser Glu Tyr Arg Leu lie Asn Cys 
180 185 190 

Asn Thr ser Ala lie Thr Gin Ala Cys Pro Lys val Ser Phe Asp Pro 
195 200 205 
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lie Pro lie His Tyr Cys Ala Pro Ala Gly Tyr Ala lie Leu Lys Cys 
210 215 220 

Asn Asn Lys Thr Phe Asn Gly Thr Gly Pro Cys Asn Asn val ser Thr 
225 230 235 240 

Val Gin Cys Thr His Gly lie Lys Pro val val ser Thr Gin Leu Leu 
245 250 255 

Leu Asn Gly Ser Leu Ala Glu Glu Glu lie lie lie Arg ser Glu Asn 
260 265 270 

Leu Thr Asn Asn Val Lys Thr lie lie val His Leu Asn Glu ser val 
275 280 285 

Glu lie val cys Thr Arg Pro Asn Asn Asn Thr Arg Lys ser lie Arg 
290 295 300 

lie Gly Pro Gly Gin Thr Phe Tyr Ala Thr Gly Asp lie lie Gly Asp 
305 310 315 320 

lie Arg Gin Ala His Cys Asn lie Ser Glu Glu Glu Trp Asn Lys Thr 
325 330 335 

Leu Gin Arg val Gly Lys Lys Leu Glu Glu His Phe Pro Asn Lys Thr 
340 345 350 

lie Lys Phe Glu Pro Ser Ser Gly Gly Asp Leu Glu lie Thr Thr His 

355 360 365 

Ser Phe Asn Cys Arg Gly Glu Phe Phe Tyr Cys Asn Thr Ser Lys Leu 
370 375 380 

Phe Asn Ser Thr Tyr Asn Gly Thr Asn Ser Thr Asn Thr Thr lie Thr 
385 390 395 400 

Leu Pro Cys Arg lie Lys Gin lie lie Asn Met Trp Gin Glu val Gly 
405 410 415 

Arg Ala Met Tyr Ala Pro Pro lie Ala Gly Asn lie Thr cys Lys Ser 
420 425 430 

Asn lie Thr Gly Leu Leu Leu val Arg Asp Gly Gly Lys Asn Asn Thr 
435 440 445 

Asn Asn Thr Glu lie Phe Arg Pro Gly Gly Gly Asp Met Arg Asp Asn 
450 455 460 

Trp Arg Ser Glu Leu Tyr Lys Tyr Lys Val Val Glu lie Lys Pro Leu 
465 470 475 480 
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Gly lie Ala Pro Thr Lys Ala Lys Arg Arg val val Glu Arg Glu Lys 
485 490 495 

Arg Ala val Gly He Gly Ala val Phe Leu Gly Phe Leu Gly Ala Ala 
500 505 510 

Gly ser Thr Met Gly Ala Ala Ser He Thr Leu Thr val Gin Ala Arg 
515 520 525 

Gin Leu Leu Ser Gly lie val Gin Gin Gin ser Asn Leu Leu Arg Ala 
530 535 540 

lie Glu Ala Gin Gin His Met Leu Gin Leu Thr val Trp Gly lie Lys 
545 550 555 560 

Gin Leu Gin Thr Arg val Leu Ala lie Glu Arg Tyr Leu Lys Asp Gin 
565 570 575 

Gin Leu Leu Gly lie Trp Gly cys ser Gly Lys Leu lie cys Thr Thr 
580 585 590 

Ala val Pro Trp Asn ser ser Trp ser Asn Lys ser Gin Glu Asp lie 
595 600 605 

Trp Asp Asn Met Thr Trp Met Gin Trp Asp Arg Glu lie ser Asn Tyr 
610 615 620 

Thr Asp Thr lie Tyr Arg Leu Leu Glu Asp Ser Gin Asn Gin Gin Glu 
625 630 635 640 

Gin Asn Glu Lys Asp Leu Leu Ala Leu Asp ser Trp Lys Asn Leu Trp 
645 650 655 

Asn Trp Phe Asp lie Thr Asn Trp Leu Trp Tyr lie Lys lie Phe lie 
660 665 670 

Met He val Gly Gly Leu He Gly Leu Arg He He Phe Ala val Leu 
675 680 685 

Ser lie val Asn Arg val Arg Gin Gly Tyr Ser Pro Leu Ser Phe Gin 
690 695 700 

Thr Leu Thr Pro Asn Pro Arg Gly Pro Asp Arg Leu Gly Arg lie Glu 
705 710 715 720 

Glu Glu Gly Gly Glu Gin Asp Arg Asp Arg Ser lie Arg Leu val Ser 
725 730 735 

Gly Phe Leu Ala Leu Ala Trp Asp Asp Leu Arg Ser Leu Cys Leu Phe 
740 745 750 
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ser Tyr His Arg Leu Arg Asp Phe lie Leu val Ala Ala Arg Ala val 
755 760 765 

Glu Leu Leu Gly Arg Ser Ser Leu Arg Gly Leu Gin Arg Gly Trp Glu 
770 775 780 

Ala Leu Lys Tyr Leu Gly Ser Leu val Gin Tyr Trp Gly Leu Glu Leu 
785 790 795 800 

Lys Lys Ser Ala lie Ser Leu Leu Asp Thr lie Ala lie Ala val Ala 
805 810 815 

Glu Gly Thr Asp Arg lie lie Glu Leu lie Gin Arg lie Cys Arg Ala 
820 825 830 

lie Arg Asn lie Pro Arg Arg lie Arg Gin Gly Phe Glu Ala Ala Leu 
835 840 845 

Gin 



<210> 101 

<211> 849 

<212> PRT 

<213> Artificial sequence 
<220> 

<223> Minimum of means of center of tree reconstructions for clade c 
160 protein sequence 

<400> 101 

Met Arg val Arg Gly lie Leu Arg Asn Cys Gin Gin Trp Trp lie Trp 
15 10 15 

Gly lie Leu Gly Phe Trp Met Leu Met lie cys Asn val val Gly Asn 
20 25 30 

Leu Trp val Thr val Tyr Tyr Gly Val Pro Val Trp Lys Glu Ala Lys 
35 40 45 

Thr Thr Leu Phe Cys Ala Ser Asp Ala Lys Ala Tyr Glu Lys Glu val 
50 55 60 

His Asn val Trp Ala Thr His Ala cys val Pro Thr Asp Pro Asn Pro 
65 70 75 80 

Gin Glu Met val Leu Glu Asn val Thr Glu Asn Phe Asn Met Trp Lys 
85 90 95 
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Asn Asp Met val Asp Gin Met His Glu Asp lie lie Ser Leu Trp Asp 
100 105 110 

Gin Ser Leu Lys Pro Cys val Lys Leu Thr Pro Leu Cys val Thr Leu 
115 120 125 

Asn Cys Ser Asn val Asn Thr Thr Asn Thr Thr Asn Asn Thr Met Lys 
130 135 140 

Gly Glu He Lys Asn cys Ser Phe Asn val Thr Thr Glu Leu Arg Asp 
145 150 155 160 

Lys Lys Lys Lys Glu Tyr Ala Leu Phe Tyr Arg Leu Asp lie Val Pro 
165 170 175 

Leu Asn Glu Asn Asn Asn Asn Ser Ser Glu Tyr Arg Leu lie Asn cys 
180 185 190 

Asn Thr ser Ala lie Thr Gin Ala Cys Pro Lys Val Ser Phe Asp Pro 
195 200 205 

lie Pro lie His Tyr Cys Ala Pro Ala Gly Tyr Ala lie Leu Lys Cys 
210 215 220 

Asn Asn Lys Thr Phe Asn Gly Thr Gly Pro Cys Asn Asn val Ser Thr 
225 230 235 240 

val Gin Cys Thr His Gly lie Lys Pro val val Ser Thr Gin Leu Leu 
245 250 255 

Leu Asn Gly Ser Leu Ala Glu Glu Glu lie lie lie Arg ser Glu Asn 
260 265 270 

Leu Thr Asn Asn Ala Lys Thr lie lie val His Leu Asn Glu Ser val 
275 280 285 

Glu lie val Cys Thr Arg Pro Asn Asn Asn Thr Arg Lys Ser lie Arg 
290 295 300 

lie Gly Pro Gly Gin Thr Phe Tyr Ala Thr Gly Asp lie lie Gly Asp 
305 310 315 320 

lie Arg Gin Ala His Cys Asn lie ser Glu Glu Glu Trp Asn Lys Thr 

325 330 335 

Leu Gin Arg val Gly Lys Lys Leu Glu Glu His Phe Pro Asn Lys Thr 
340 345 350 

lie Lys Phe Glu Pro Ser ser Gly Gly Asp Leu Glu lie Thr Thr His 
355 360 365 
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Sen Phe Asn Cys Arg Gly Glu Phe Phe Tyr Cys Asn Thr ser Lys Leu 
370 375 380 

Phe Asn ser Thr Tyr Asn Gly Thr Asn Ser Thr Asn ser Thr lie Thr 
385 390 395 400 

Leu Gin Cys Arg lie Lys Gin lie lie Asn Met Trp Gin Glu val Gly 
405 410 415 

Arg Ala Met Tyr Ala Pro Pro lie Ala Gly Asn lie Thr cys Lys Ser 
420 425 430 

Asn lie Thr Gly Leu Leu Leu val Arg Asp Gly Gly Lys Asn Asp Thr 
435 440 445 

Asn Asp Thr Glu lie Phe Arg Pro Gly Gly Gly Asp Met Arg Asp Asn 
450 455 460 

Trp Arg Ser Glu Leu Tyr Lys Tyr Lys val val Glu lie Lys Pro Leu 
465 470 475 480 

Gly lie Ala Pro Thr Lys Ala Lys Arg Arg Val Val Glu Arg Glu Lys 
485 490 495 

Arg Ala val Gly He Gly Ala val Phe Leu Gly Phe Leu Gly Ala Ala 
500 505 510 

Gly Ser Thr Met Gly Ala Ala Ser He Thr Leu Thr Val Gin Ala Arg 
515 520 525 

Gin Leu Leu Ser Gly lie val Gin Gin Gin Ser Asn Leu Leu Arg Ala 
530 535 540 

lie Glu Ala Gin Gin His Met Leu Gin Leu Thr val Trp Gly lie Lys 
545 550 555 560 

Gin Leu Gin Thr Arg val Leu Ala lie Glu Arg Tyr Leu Lys Asp Gin 
565 570 575 

Gin Leu Leu Gly lie Trp Gly Cys Ser Gly Lys Leu lie Cys Thr Thr 
580 585 590 

Ala Val Pro Trp Asn ser ser Trp ser Asn Lys Ser Gin Glu Asp lie 
595 600 605 

Trp Asp Asn Met Thr Trp Met Gin Trp Asp Arg Glu lie ser Asn Tyr 
610 615 620 

Thr Asp Thr lie Tyr Arg Leu Leu Glu Asp ser Gin Asn Gin Gin Glu 
625 630 635 640 
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Gln Asn Glu Lys Asp Leu Leu Ala Leu Asp sen Trp Lys Asn Leu Trp 
645 650 655 

Asn Trp Phe Asp lie Thr Asn Trp Leu Trp Tyr lie Lys lie Phe lie 
660 665 670 

Met lie val Gly Gly Leu He Gly Leu Arg He He Phe Ala val Leu 
675 680 685 

Ser lie val Asn Arg Val Arg Gin Gly Tyr Ser Pro Leu ser Phe Gin 
690 695 700 

Thr Leu Thr Pro Asn Pro Arg Gly Pro Asp Arg Leu Gly Arg lie Glu 
705 710 715 720 

Glu Glu Gly Gly Glu Gin Asp Arg Asp Arg Ser lie Arg Leu val Ser 
725 730 735 

Gly Phe Leu Ala Leu Ala Trp Asp Asp Leu Arg ser Leu cys Leu Phe 
740 745 750 

Ser Tyr His Arg Leu Arg Asp Phe lie Leu Val Ala Ala Arg Ala val 
755 760 765 

Glu Leu Leu Gly Arg Ser Ser Leu Arg Gly Leu Gin Arg Gly Trp Glu 
770 775 780 

Ala Leu Lys Tyr Leu Gly ser Leu val Gin Tyr Trp Gly Leu Glu Leu 
785 790 795 800 

Lys Lys ser Ala lie ser Leu Leu Asp Thr lie Ala lie Ala val Ala 
805 810 815 

Glu Gly Thr Asp Arg lie lie Glu Leu lie Gin Arg lie Cys Arg Ala 
820 825 830 

lie Arg Asn lie Pro Arg Arg lie Arg Gin Gly Phe Glu Ala Ala Leu 
835 840 845 

Gin 



<210> 102 

<211> 206 

<212> PRT 

<213> Artificial sequence 



<220> 
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<223> Most recent common ancestor of reconstructions of clade C nef pro 
tein sequences 

<400> 102 

Met Gly Gly Lys Trp ser Lys Ser ser lie val Gly Trp Pro Ala val 
15 10 15 

Arg Glu Arg lie Arg Arg Thr Ala Pro Ala Ala Glu Gly Val Gly Ala 
20 25 30 

Ala ser Gin Asp Leu Asp Lys His Gly Ala Leu Thr ser Ser Asn Thr 
35 40 45 

Ala Ala Thr Asn Ala Asp Cys Ala Trp Leu Glu Ala Gin Glu Glu Glu 
50 55 60 

Glu Val Gly Phe Pro val Arg Pro Gin val Pro Leu Arg Pro Met Thr 
65 70 75 80 

Tyr Lys Gly Ala val Asp Leu ser Phe Phe Leu Lys Glu Lys Gly Gly 
85 90 95 

Leu Glu Gly Leu lie Tyr Ser Lys Lys Arg Gin Glu lie Leu Asp Leu 
100 105 110 

Trp Val Tyr His Thr Gin Gly Tyr Phe Pro Asp Trp Gin Asn Tyr Thr 
115 120 125 

Pro Gly Pro Gly lie Arg Phe Pro Leu Thr Phe Gly Trp Cys Phe Lys 
130 135 140 

Leu Val Pro Val Asp Pro Arg Glu Val Glu Glu Ala Asn Glu Gly Glu 
145 150 155 160 

Asn Asn Cys Leu Leu His Pro Met Ser Gin His Gly Met Glu Asp Glu 
165 170 175 

Asp Arg Glu val Leu Lys Trp Lys Phe Asp Ser His Leu Ala Arg Arg 
180 185 190 

His Met Ala Arg Glu Leu His Pro Glu Tyr Tyr Lys Asp Cys 
195 200 205 

<210> 103 

<211> 207 

<212> PRT 

<213> Artificial sequence 
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<220> 

<223> Least squares and minimum of means of center of tree reconstruct! 
ons of dade C nef protein sequence 

<400> 103 

Met Gly Gly Lys Trp Ser Lys Ser ser lie val Gly Trp Pro Ala val 
15 10 15 

Arg Glu Arg lie Arg Arg Thr Glu Pro Ala Ala Glu Gly val Gly Ala 
20 25 30 

Ala ser Gin Asp Leu Asp Lys His Gly Ala Leu Thr ser ser Asn Thr 
35 40 45 

Ala Ala Asn Asn Ala Asp Cys Ala Trp Leu Glu Ala Gin Glu Glu Glu 
50 55 60 

Glu Glu Val Gly Phe Pro val Arg Pro Gin val Pro Leu Arg Pro Met 
65 70 75 80 

Thr Tyr Lys Gly Ala Phe Asp Leu Ser Phe Phe Leu Lys Glu Lys Gly 
85 90 95 

Gly Leu Glu Gly Leu lie Tyr ser Lys Lys Arg Gin Glu lie Leu Asp 
100 105 110 

Leu Trp val Tyr His Thr Gin Gly Tyr Phe Pro Asp Trp Gin Asn Tyr 
115 120 125 

Thr Pro Gly Pro Gly Val Arg Tyr Pro Leu Thr Phe Gly Trp Cys Phe 
130 135 140 

Lys Leu val Pro val Asp Pro Arg Glu val Glu Glu Ala Asn Glu Gly 
145 150 155 160 

Glu Asn Asn Cys Leu Leu His Pro Met ser Gin His Gly Met Glu Asp 
165 170 175 

Glu Asp Arg Glu val Leu Lys Trp Lys Phe Asp Ser His Leu Ala Arg 
180 185 190 

Arg His Met Ala Arg Glu Leu His Pro Glu Tyr Tyr Lys Asp Cys 
195 200 205 

<210> 104 
<211> 999 
<212> PRT 

<213> Artificial sequence 
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<220> 

<223> Most recent common ancestor for reconstruction of dade C pol pro 
te1n sequences 

<400> 104 

Phe Phe Arg Glu Asn Leu Ala Phe Pro Gin Gly Glu Ala Arg Glu Phe 
15 10 15 

Pro Ser Glu Gin Thr Arg Ala Asn Ser Pro Thr Ser Arg Glu Leu Gin 
20 25 30 

Val Arg Gly Asp Asn Pro Arg Ser Glu Ala Gly Ala Glu Arg Gin Gly 
35 40 45 

Thr Leu Asn Phe Pro Gin lie Thr Leu Trp Gin Arg Pro Leu Val Ser 
50 55 60 

lie Lys Val Gly Gly Gin lie Lys Glu Ala Leu Leu Asp Thr Gly Ala 
65 70 75 80 

Asp Asp Thr val Leu Glu Asp lie Asn Leu Pro Gly Lys Trp Lys Pro 
85 90 95 

Lys Met lie Gly Gly lie Gly Gly Phe lie Lys val Arg Gin Tyr Asp 
100 105 110 

Gin He Leu He Glu He Cys Gly Lys Lys Ala He Gly Thr val Leu 
115 120 125 

val Gly Pro Thr Pro val Asn lie lie Gly Arg Asn Met Leu Thr Gin 
130 135 140 

Leu Gly Cys Thr Leu Asn Phe Pro lie Ser Pro lie Glu Thr val Pro 
145 150 155 160 

val Lys Leu Lys Pro Gly Met Asp Gly Pro Lys val Lys Gin Trp Pro 
165 170 175 

Leu Thr Glu Glu Lys He Lys Ala Leu Thr Ala He Cys Glu Glu Met 
180 185 190 

Glu Lys Glu Gly Lys lie Thr Lys lie Gly Pro Glu Asn Pro Tyr Asn 
195 200 205 

Thr Pro Val Phe Ala lie Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys 
210 215 220 

Leu Val Asp Phe Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu 
225 230 235 240 
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Val Gin Leu Gly lie Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser 
245 250 255 

val Thr Val Leu Asp val Gly Asp Ala Tyr Phe Ser val Pro Leu Asp 
260 265 270 

Glu Asp Phe Arg Lys Tyr Thr Ala Phe Thr lie Pro ser lie Asn Asn 
275 280 285 

Glu Thr Pro Gly lie Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp 
290 295 300 

Lys Gly Ser Pro Ala lie Phe Gin ser Ser Met Thr Lys lie Leu Glu 
305 310 315 320 

Pro Phe Arg Ala Gin Asn Pro Glu lie val lie Tyr Gin Tyr Met Asp 
325 330 335 

Asp Leu Tyr Val Gly Ser Asp Leu Glu lie Gly Gin His Arg Ala Lys 
340 345 350 

lie Glu Glu Leu Arg Glu His Leu Leu Lys Trp Gly Phe Thr Thr Pro 
355 360 365 

Asp Lys Lys His Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu 
370 375 380 

Leu His Pro Asp Lys Trp Thr Val Gin Pro lie Gin Leu Pro Glu Lys 
385 390 395 400 

Asp Ser Trp Thr val Asn Asp lie Gin Lys Leu val Gly Lys Leu Asn 
405 410 415 

Trp Ala Ser Gin lie Tyr Pro Gly lie Lys Val Arg Gin Leu Cys Lys 
420 425 430 

Leu Leu Arg Gly Ala Lys Ala Leu Thr Asp lie val Pro Leu Thr Glu 
435 440 445 

Glu Ala Glu Leu Glu Leu Ala Glu Asn Arg Glu lie Leu Lys Glu Pro 
450 455 460 

val His Gly val Tyr Tyr Asp Pro Ser Lys Asp Leu lie Ala Glu lie 
465 470 475 480 

Gin Lys Gin Gly His Asp Gin Trp Thr Tyr Gin lie Tyr Gin Glu Pro 
485 490 495 

Phe Lys Asn Leu Lys Thr Gly Lys Tyr Ala Lys Met Arg Ser Ala His 
500 505 510 
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Thr Asn Asp val Lys Gin Leu Thr Glu Ala val Gin Lys lie Ala Met 
515 520 525 

Glu ser lie val lie Trp Gly Lys Thr Pro Lys Phe Arg Leu Pro lie 
530 535 540 

Gin Lys Glu Thr Trp Glu Thr Trp Trp Thr Asp Tyr Trp Gin Ala Thr 
545 550 555 560 

Trp lie Pro Glu Trp Glu Phe val Asn Thr Pro Pro Leu val Lys Leu 
565 570 575 

Trp Tyr Gin Leu Glu Lys Glu Pro lie Ala Gly Ala Glu Thr Phe Tyr 
580 585 590 

Val Asp Gly Ala Ala Asn Arg Glu Thr Lys Leu Gly Lys Ala Gly Tyr 
595 600 605 

Val Thr Asp Lys Gly Arg Gin Lys val val Ser Leu Thr Glu Thr Thr 
610 615 620 

Asn Gin Lys Thr Glu Leu Gin Ala lie Gin Leu Ala Leu Gin Asp ser 
625 630 635 640 

Gly Ser Glu val Asn lie Val Thr Asp ser Gin Tyr Ala Leu Gly lie 
645 650 655 

lie Gin Ala Gin Pro Asp Lys Ser Glu Ser Glu Leu val Asn Gin lie 
660 665 670 

lie Glu Gin Leu lie Lys Lys Glu Lys Val Tyr Leu ser Trp val Pro 
675 680 685 

Ala His Lys Gly lie Gly Gly Asn Glu Gin val Asp Lys Leu val Ser 
690 695 700 

ser Gly lie Arg Lys val Leu Phe Leu Asp Gly lie Asp Lys Ala Gin 
705 710 715 720 

Glu Glu His Glu Lys Tyr His Ser Asn Trp Arg Ala Met Ala ser Glu 
725 730 735 

Phe Asn Leu Pro Pro lie val Ala Lys Glu lie val Ala ser cys Asp 
740 745 750 

Lys Cys Gin Leu Lys Gly Glu Ala Met His Gly Gin val Asp cys ser 
755 760 765 

Pro Gly He Trp Gin Leu Asp Cys Thr His Leu Glu Gly Lys val lie 
770 775 780 
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Leu val Ala val His val Ala ser Gly Tyr lie Glu Ala Glu val lie 
785 790 795 800 

Pro Ala Glu Thr Gly Gin Glu Thr Ala Tyr Phe lie Leu Lys Leu Ala 
805 810 815 

Gly Arg xrp Pro val Lys val lie His Thr Asp Asn Gly Ser Asn Phe 
820 825 830 

Thr ser Ala Ala val Lys Ala Ala Cys Trp Trp Ala Gly lie Gin Gin 
835 840 845 

Glu Phe Gly lie Pro Tyr Asn Pro Gin Ser Gin Gly val val Glu ser 
850 855 860 

Met Asn Lys Glu Leu Lys Lys lie lie Gly Gin val Arg Asp Gin Ala 
865 870 875 880 

Glu His Leu Lys Thr Ala val Gin Met Ala val Phe He His Asn Phe 
885 890 895 

Lys Arg Lys Gly Gly lie Gly Gly Tyr ser Ala Gly Glu Arg lie lie 
900 905 910 

Asp lie lie Ala Thr Asp lie Gin Thr Lys Glu Leu Gin Lys Gin lie 
915 920 925 

lie Lys lie Gin Asn Phe Arg val Tyr Tyr Arg Asp Ser Arg Asp Pro 
930 935 940 

Val Trp Lys Gly Pro Ala Lys Leu Leu Trp Lys Gly Glu Gly Ala val 
945 950 955 960 

val lie Gin Asp Asn ser Asp lie Lys val val Pro Arg Arg Lys Ala 
965 970 975 

Lys lie lie Arg Asp Tyr Gly Lys Gin Met Ala Gly Ala Asp Cys val 
980 985 990 

Ala Gly Arg Gin Asp Glu Asp 
995 

<210> 105 

<211> 999 

<212> PRT 

<213> Artificial sequence 
<220> 
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<223> Least squares center of tree reconstruction of dade C pol protei 
n sequence 

<400> 105 

Phe Phe Arg Glu Asn Leu Ala Phe Pro Gin Gly Glu Ala Arg Glu Phe 
15 10 15 

Pro Ser Glu Gin Thr Arg Ala Asn ser Pro Thr Ser Arg Glu Leu Gin 
20 25 30 

val Arg Gly Asp Asn Pro Arg ser Glu Ala Gly Ala Glu Arg Gin Gly 
35 40 45 

Thr Leu Asn Phe Pro Gin lie Thr Leu Trp Gin Arg Pro Leu val Ser 
50 55 60 

lie Lys val Gly Gly Gin lie Lys Glu Ala Leu Leu Asp Thr Gly Ala 
65 70 75 80 

Asp Asp Thr val Leu Glu Glu lie Asn Leu Pro Gly Lys Trp Lys Pro 
85 90 95 

Lys Met lie Gly Gly lie Gly Gly Phe lie Lys val Arg Gin Tyr Asp 
100 105 110 

Gin lie Leu lie Glu lie Cys Gly Lys Lys Ala lie Gly Thr val Leu 
115 120 125 

Val Gly Pro Thr Pro val Asn lie lie Gly Arg Asn Met Leu Thr Gin 
130 135 140 

Leu Gly Cys Thr Leu Asn Phe Pro lie Ser Pro lie Glu Thr Val Pro 
145 150 155 160 

val Lys Leu Lys Pro Gly Met Asp Gly Pro Lys val Lys Gin Trp Pro 
165 170 175 

Leu Thr Glu Glu Lys lie Lys Ala Leu Thr Ala lie Cys Glu Glu Met 
180 185 190 

Glu Lys Glu Gly Lys lie Thr Lys lie Gly Pro Glu Asn Pro Tyr Asn 
195 200 205 

Thr Pro val Phe Ala lie Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys 
210 215 220 

Leu val Asp Phe Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu 
225 230 235 240 

Val Gin Leu Gly lie Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser 
245 250 255 
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val Thr val Leu Asp val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp 
260 265 270 

Glu Gly Phe Arg Lys Tyr Thr Ala Phe Thr lie Pro Ser lie Asn Asn 
275 280 285 

Glu Thr Pro Gly lie Arg Tyr Gin Tyr Asn val Leu Pro Gin Gly Trp 
290 295 300 

Lys Gly Ser Pro Ala lie Phe Gin ser Ser Met Thr Lys lie Leu Glu 
305 310 315 320 

Pro Phe Arg Ala Gin Asn Pro Glu lie val lie Tyr Gin Tyr Met Asp 
325 330 335 

Asp Leu Tyr val Gly Ser Asp Leu Glu lie Gly Gin His Arg Ala Lys 
340 345 350 

lie Glu Glu Leu Arg Glu His Leu Leu Lys Trp Gly Phe Thr Thr Pro 
355 360 365 

Asp Lys Lys His Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu 
370 375 380 

Leu His Pro Asp Lys Trp Thr Val Gin Pro lie Gin Leu Pro Glu Lys 
385 390 395 400 

Asp Ser Trp Thr val Asn Asp lie Gin Lys Leu val Gly Lys Leu Asn 
405 410 415 

Trp Ala Ser Gin lie Tyr Pro Gly lie Lys val Arg Gin Leu Cys Lys 
420 425 430 

Leu Leu Arg Gly Ala Lys Ala Leu Thr Asp lie val Pro Leu Thr Glu 
435 440 445 

Glu Ala Glu Leu Glu Leu Ala Glu Asn Arg Glu lie Leu Lys Glu Pro 
450 455 460 

Val His Gly val Tyr Tyr Asp Pro Ser Lys Asp Leu lie Ala Glu lie 
465 470 475 480 

Gin Lys Gin Gly His Asp Gin Trp Thr Tyr Gin lie Tyr Gin Glu Pro 
485 490 495 

Phe Lys Asn Leu Lys Thr Gly Lys Tyr Ala Lys Met Arg Thr Ala His 
500 505 510 

Thr Asn Asp val Lys Gin Leu Thr Glu Ala Val Gin Lys lie Ala Met 
515 520 525 
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Glu Ser lie val lie Trp Gly Lys Thr Pro Lys Phe Arg Leu Pro lie 
530 535 540 

Gin Lys Glu Thr Trp Glu Thr Trp Trp Thr Asp Tyr Trp Gin Ala Thr 

545 550 555 560 

Trp lie Pro Glu Trp Glu Phe val Asn Thr Pro Pro Leu Val Lys Leu 

565 570 575 

Trp Tyr Gin Leu Glu Lys Glu Pro lie Ala Gly Ala Glu Thr Phe Tyr 
580 585 590 

val Asp Gly Ala Ala Asn Arg Glu Thr Lys lie Gly Lys Ala Gly Tyr 
595 600 605 

val Thr Asp Arg Gly Arg Gin Lys lie Val Ser Leu Thr Glu Thr Thr 
610 615 620 

Asn Gin Lys Thr Glu Leu Gin Ala lie Gin Leu Ala Leu Gin Asp ser 
625 630 635 640 

Gly Ser Glu val Asn lie val Thr Asp ser Gin Tyr Ala Leu Gly He 
645 650 655 

lie Gin Ala Gin Pro Asp Lys Ser Glu Ser Glu Leu val Asn Gin lie 
660 665 670 

lie Glu Gin Leu lie Lys Lys Glu Arg Val Tyr Leu Ser Trp Val Pro 
675 680 685 

Ala His Lys Gly lie Gly Gly Asn Glu Gin val Asp Lys Leu val Ser 
690 695 700 

Ser Gly lie Arg Lys val Leu Phe Leu Asp Gly lie Asp Lys Ala Gin 
705 710 715 720 

Glu Glu His Glu Lys Tyr His ser Asn Trp Arg Ala Met Ala ser Glu 
725 730 735 

Phe Asn Leu Pro Pro He val Ala Lys Glu He Val Ala Ser Cys Asp 
740 745 750 

Lys Cys Gin Leu Lys Gly Glu Ala lie His Gly Gin val Asp Cys Ser 
755 760 765 

Pro Gly lie Trp Gin Leu Asp Cys Thr His Leu Glu Gly Lys lie lie 

770 775 780 

Leu val Ala val His val Ala ser Gly Tyr lie Glu Ala Glu val lie 
785 790 795 800 
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Pro Ala Glu Thr Gly Gin Glu Thr Ala Tyr Phe lie Leu Lys Leu Ala 
805 810 815 

Gly Arg Trp Pro val Lys Val lie His Thr Asp Asn Gly Ser Asn Phe 
820 825 830 

Thr Ser Ala Ala val Lys Ala Ala cys Trp Trp Ala Gly lie Gin Gin 
835 840 845 

Glu Phe Gly lie Pro Tyr Asn Pro Gin Ser Gin Gly Val Val Glu Ser 
850 855 860 

Met Asn Lys Glu Leu Lys Lys lie lie Gly Gin val Arg Asp Gin Ala 
865 870 875 880 

Glu His Leu Lys Thr Ala val Gin Met Ala val Phe lie His Asn Phe 
885 890 895 

Lys Arg Lys Gly Gly lie Gly Gly Tyr Ser Ala Gly Glu Arg lie lie 
900 905 910 

Asp lie lie Ala Thr Asp lie Gin Thr Lys Glu Leu Gin Lys Gin lie 
915 920 925 

lie Lys lie Gin Asn Phe Arg val Tyr Tyr Arg Asp Ser Arg Asp Pro 
930 935 940 

lie Trp Lys Gly Pro Ala Lys Leu Leu Trp Lys Gly Glu Gly Ala val 
945 950 955 960 

val lie Gin Asp Asn ser Asp lie Lys Val val Pro Arg Arg Lys Ala 
965 970 975 

Lys lie lie Lys Asp Tyr Gly Lys Gin Met Ala Gly Ala Asp Cys val 
980 985 990 

Ala Gly Arg Gin Asp Glu Asp 
995 

<210> 106 

<211> 999 

<212> PRT 

<213> Artificial sequence 



<220> 

<223> Minimum of means center of tree reconstruction of clade c pol pro 
tein sequence 
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<400> 106 

Phe Phe Arg Glu Asn Leu Ala Phe Pro Gln Gly Glu Ala Arg Glu Phe 
15 10 15 

Pro Ser Glu Gin Thr Arg Ala Asn ser Pro Thr Ser Arg Glu Leu Gin 
20 25 30 

val Arg Gly Asp Asn Pro Leu Ser Glu Ala Gly Ala Glu Arg Gin Gly 
35 40 45 

Thr Leu Asn Phe Pro Gin lie Thr Leu Trp Gin Arg Pro Leu val Thr 
50 55 60 

lie Lys Val Gly Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala 
65 70 75 80 

Asp Asp Thr val Leu Glu Glu lie Asn Leu Pro Gly Lys Trp Lys Pro 
85 90 95 

Lys Met lie Gly Gly lie Gly Gly Phe lie Lys val Arg Gin Tyr Asp 
100 105 110 

Gin lie Leu lie Glu lie Cys Gly Lys Lys Ala lie Gly Thr val Leu 
115 120 125 

Val Gly Pro Thr Pro Val Asn lie lie Gly Arg Asn Met Leu Thr Gin 
130 135 140 

Leu Gly Cys Thr Leu Asn Phe Pro lie ser Pro lie Glu Thr val Pro 
145 150 155 160 

val Lys Leu Lys Pro Gly Met Asp Gly Pro Lys val Lys Gin Trp Pro 
165 170 175 

Leu Thr Glu Glu Lys lie Lys Ala Leu Thr Ala lie Cys Glu Glu Met 
180 185 190 

Glu Lys Glu Gly Lys lie Thr Lys lie Gly Pro Glu Asn Pro Tyr Asn 
195 200 205 

Thr Pro Val Phe Ala lie Lys Lys Lys Asp ser Thr Lys Trp Arg Lys 
210 215 220 

Leu Val Asp Phe Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu 
225 230 235 240 

val Gin Leu Gly lie Pro His Pro Ala Gly Leu Lys Lys Lys Lys ser 
245 250 255 

Val Thr val Leu Asp val Gly Asp Ala Tyr Phe ser val Pro Leu Asp 
260 265 270 
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Glu Gly Phe Arg Lys Tyr Thr Ala Phe Thr lie Pro Sen lie Asn Asn 
275 280 285 

Glu Thr Pro Gly lie Arg Tyr Gin Tyr Asn val Leu Pro Gin Gly Trp 
290 295 300 

Lys Gly ser Pro Ala lie Phe Gin Ser ser Met Thr Lys lie Leu Glu 
305 310 315 320 

Pro Phe Arg Ala Gin Asn Pro Glu lie Val lie Tyr Gin Tyr Met Asp 
325 330 335 

Asp Leu Tyr Val Gly Ser Asp Leu Glu lie Gly Gin His Arg Ala Lys 
340 345 350 

lie Glu Glu Leu Arg Glu His Leu Leu Lys Trp Gly Phe Thr Thr Pro 
355 360 365 

Asp Lys Lys His Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu 
370 375 380 

Leu His Pro Asp Lys Trp Thr Val Gin Pro lie Gin Leu Pro Glu Lys 
385 390 395 400 

Asp Ser Trp Thr val Asn Asp lie Gin Lys Leu val Gly Lys Leu Asn 
405 410 415 

Trp Ala ser Gin lie Tyr Pro Gly lie Lys val Arg Gin Leu Cys Lys 
420 425 430 

Leu Leu Arg Gly Ala Lys Ala Leu Thr Asp lie val Pro Leu Thr Glu 
435 440 445 

Glu Ala Glu Leu Glu Leu Ala Glu Asn Arg Glu lie Leu Lys Glu Pro 
450 455 460 

val His Gly val Tyr Tyr Asp Pro Ser Lys Asp Leu lie Ala Glu lie 
465 470 475 480 

Gin Lys Gin Gly His Asp Gin Trp Thr Tyr Gin lie Tyr Gin Glu Pro 
485 490 495 

Phe Lys Asn Leu Lys Thr Gly Lys Tyr Ala Lys Met Arg Thr Ala His 
500 505 510 

Thr Asn Asp val Lys Gin Leu Thr Glu Ala val Gin Lys lie Ala Met 
515 520 525 

Glu Ser lie val lie Trp Gly Lys Thr Pro Lys Phe Arg Leu Pro lie 
530 535 540 
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Gin Lys Glu Thr Trp Glu Ala Trp Trp Thr Asp Tyr Trp Gin Ala Thr 
545 550 555 560 

Trp lie Pro Glu Trp Glu Phe val Asn Thr Pro Pro Leu val Lys Leu 
565 570 575 

Trp Tyr Gin Leu Glu Lys Glu Pro He Ala Gly Ala Glu Thr Phe Tyr 
580 585 590 

val Asp Gly Ala Ala Asn Arg Glu Thr Lys lie Gly Lys Ala Gly Tyr 
595 600 605 

Val Thr Asp Arg Gly Arg Gin Lys lie Val Ser Leu Thr Glu Thr Thr 
610 615 620 

Asn Gin Lys Thr Glu Leu Gin Ala lie Gin Leu Ala Leu Gin Asp ser 
625 630 635 640 

Gly Ser Glu val Asn lie val Thr Asp Ser Gin Tyr Ala Leu Gly lie 
645 650 655 

lie Gin Ala Gin Pro Asp Lys Ser Glu ser Glu Leu Val Asn Gin lie 
660 665 670 

lie Glu Gin Leu lie Lys Lys Glu Arg val Tyr Leu Ser Trp Val Pro 
675 680 685 

Ala His Lys Gly lie Gly Gly Asn Glu Gin val Asp Lys Leu val Ser 
690 695 700 

ser Gly lie Arg Lys val Leu Phe Leu Asp Gly lie Asp Lys Ala Gin 
705 710 715 720 

Glu Glu His Glu Lys Tyr His Ser Asn Trp Arg Ala Met Ala Ser Glu 
725 730 735 

Phe Asn Leu Pro Pro lie val Ala Lys Glu lie val Ala Ser Cys Asp 
740 745 750 

Lys Cys Gin Leu Lys Gly Glu Ala lie His Gly Gin val Asp Cys ser 
755 760 765 

Pro Gly lie Trp Gin Leu Asp Cys Thr His Leu Glu Gly Lys lie lie 
770 775 780 

Leu val Ala val His val Ala ser Gly Tyr lie Glu Ala Glu val lie 
785 790 795 800 

Pro Ala Glu Thr Gly Gin Glu Thr Ala Tyr Phe lie Leu Lys Leu Ala 
805 810 815 
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Gly Arg Trp Pro Val Lys val lie His Thr Asp Asn Gly Ser Asn Phe 
820 825 830 

Thr Ser Ala Ala val Lys Ala Ala Cys Trp Trp Ala Gly lie Gin Gin 
835 840 845 

Glu Phe Gly lie Pro Tyr Asn Pro Gin Ser Gin Gly val val Glu Ser 
850 855 860 

Met Asn Lys Glu Leu Lys Lys lie lie Gly Gin val Arg Asp Gin Ala 
865 870 875 880 

Glu His Leu Lys Thr Ala Val Gin Met Ala Val Phe lie His Asn Phe 
885 890 895 

Lys Arg Lys Gly Gly lie Gly Gly Tyr Ser Ala Gly Glu Arg lie lie 
900 905 910 

Asp lie lie Ala Thr Asp lie Gin Thr Lys Glu Leu Gin Lys Gin lie 
915 920 925 

lie Lys lie Gin Asn Phe Arg val Tyr Tyr Arg Asp ser Arg Asp Pro 

930 935 y y ^ g^g 

lie Trp Lys Gly Pro Ala Lys Leu Leu Trp Lys Gly Glu Gly Ala val 
945 950 955 960 

val lie Gin Asp Asn ser Asp lie Lys val val Pro Arg Arg Lys Ala 
965 970 975 

Lys lie lie Lys Asp Tyr Gly Lys Gin Met Ala Gly Ala Asp cys val 
980 985 990 

Ala Gly Arg Gin Asp Glu Asp 
995 

<210> 107 

<211> 107 

<212> PRT 

<213> Artificial sequence 



<220> 

<223> Most common recent ancestor reconstructions of clade c rev protei 
n sequence 

<400> 107 

Met Ala Gly Arg Ser Gly Asp Ser Asp Glu Ala Leu Leu Gin Ala Val 
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15 10 15 

Arg lie lie Lys lie Leu Tyr Gin ser Asn Pro Tyr Pro Lys Pro Glu 
20 25 30 

Gly Thr Arg Gin Ala Arg Arg Asn Arg Arg Arg Arg Trp Arg Ala Arg 
35 40 45 

Gin Arg Gin lie His Ser lie Ser Glu Arg lie Leu ser Thr Cys Leu 
50 55 60 

Gly Arg Pro Ala Glu Pro val Pro Leu Gin Leu Pro Pro Leu Glu Arg 
65 70 75 80 

Leu His Leu Asp cys Ser Glu Asp Cys Gly Thr Ser Gly Thr Gin Gin 
85 90 95 

Ser Gin Gly Thr Thr Glu Gly val Gly ser Pro 
100 105 

<210> 108 
<211> 107 
<212> PRT 

<213> Artificial sequence 



<220> 

<223> Least squares center of tree reconstructions of clade C rev prote 
in sequence 

<400> 108 

Met Ala Gly Arg ser Gly Asp Ser Asp Glu Ala Leu Leu Gin Ala val 
15 10 15 

Arg lie lie Lys lie Leu Tyr Gin ser Asn Pro Tyr Pro Lys Pro Glu 
20 25 30 

Gly Thr Arg Gin Ala Arg Lys Asn Arg Arg Arg Arg Trp Arg Ala Arg 
35 40 45 

Gin Arg Gin lie His Ser lie Ser Glu Arg lie Leu Ser Thr Cys Leu 
50 55 60 

Gly Arg Pro Ala Glu Pro Val Pro Leu Gin Leu Pro Pro lie Glu Arg 
65 70 75 80 

Leu His lie Gly Asp Ser Glu Ser Ser Gly Thr Ser Gly Thr Gin Gin 
85 90 95 
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ser Gin Gly Thr Thr Glu Gly val Gly Ser Pro 
100 105 



<210> 109 

<211> 107 

<212> PRT 

<213> Artificial sequence 



<220> 

<223> Minimum of means center of tree reconstructions of clade c rev pr 
otein sequence 

<400> 109 

Met Ala Gly Arg ser Gly Asp Ser Asp Glu Ala Leu Leu Gin Ala val 
15 10 15 

Arg lie lie Lys lie Leu Tyr Gin ser Asn Pro Tyr Pro Lys Pro Glu 
20 25 30 

Gly Thr Arg Gin Ala Arg Lys Asn Arg Arg Arg Arg Trp Arg Ala Arg 
35 40 45 

Gin Arg Gin lie His Ser lie Ser Glu Arg lie Leu Ser Thr Cys Leu 
50 55 60 

Gly Arg Pro Ala Glu Pro val Pro Phe Gin Leu Pro Pro lie Glu Arg 
65 70 75 80 

Leu His lie Gly Asp Ser Glu ser Ser Gly Thr Ser Gly Thr Gin Gin 
85 90 95 

Ser Gin Gly Thr Thr Glu Gly val Gly Ser Pro 
100 105 

<210> 110 
<211> 101 
<212> PRT 

<213> Artificial sequence 



<220> 

<223> Most recent common ancestor reconstructions of clade c tat protei 
n sequence 

<400> 110 

Met Glu Pro val Asp Pro Asn Leu Glu Pro Trp Asn His Pro Gly Ser 
15 10 15 
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Gin Pro Lys Thr Ala Cys Asn Lys Cys Tyr Cys Lys Lys Cys Ser Tyr 
20 25 30 

His Cys Leu val Cys Phe Leu Thr Lys Gly Leu Gly lie Ser Tyr Gly 
35 40 45 

Arg Lys Lys Arg Arg Gin Arg Arg Arg Ala Pro Pro ser ser Glu Asp 
50 55 60 

His Gin Asn Pro lie Ser Lys Gin Pro Leu Ser Gin Thr Arg Gly Asp 
65 70 75 80 

Pro Thr Gly Ser Glu Glu Ser Lys Lys Lys val Glu Ser Lys Thr Glu 
85 90 95 

Thr Asp Pro Cys Asp 
100 

<210> 111 
<211> 101 
<212> PRT 

<213> Artificial sequence 



<220> 

<223> Least squares and minimum of means center of tree reconstruction 
of clade C tat protein sequence 

<400> 111 

Met Glu Pro Val Asp Pro Asn Leu Glu Pro Trp Asn His Pro Gly Ser 
15 10 15 

Gin Pro Lys Thr Pro Cys Asn Lys Cys Tyr Cys Lys His Cys Ser Tyr 
20 25 30 

His Cys Leu val Cys Phe Gin Thr Lys Gly Leu Gly lie Ser Tyr Gly 
35 40 45 

Arg Lys Lys Arg Arg Gin Arg Arg Ser Ala Pro Pro Ser Ser Glu Asp 
50 55 60 

His Gin Asn Pro lie ser Lys Gin Pro Leu Pro Gin Thr Arg Gly Asp 
65 70 75 80 

Pro Thr Gly Ser Glu Glu Ser Lys Lys Lys val Glu Ser Lys Thr Glu 
85 90 95 

Thr Asp Pro Phe Asp 
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100 

<210> 112 
<211> 192 
<212> PRT 

<213> Artificial sequence 
<220> 

<223> Most recent common ancestor reconstructions of dade C vif protei 
n sequence 

<400> 112 

Met Glu Asn Arg Trp Gin Val Leu lie val Trp Gin val Asp Arg Met 
15 10 15 

Lys lie Arg Thr Trp Asn Ser Leu Val Lys His His Met Tyr Val Ser 
20 25 30 

Arg Arg Ala Lys Gly Trp Phe Tyr Arg His His Tyr Glu Ser Arg His 
35 40 45 

Pro Lys lie Ser ser Glu Val His lie Pro Leu Gly Asp Ala Arg Leu 
50 55 60 

val lie Lys Thr Tyr Trp Gly Leu His Thr Gly Glu Arg Asp Trp His 
65 70 75 80 

Leu Gly His Gly val ser lie Glu Trp Arg Leu Arg Arg Tyr ser Thr 
85 90 95 

Gin val Asp Pro Gly Leu Ala Asp Gin Leu lie His Met His Tyr Phe 
100 105 110 

Asp cys Phe Ala Asp ser Ala lie Arg Lys Ala lie Leu Gly His lie 
115 120 125 

Val ser Pro Arg Cys Asp Tyr Gin Ala Gly His Asn Lys Val Gly ser 
130 135 140 

Leu Gin Tyr Leu Ala Leu Thr Ala Leu lie Lys Pro Lys Lys lie Lys 
145 150 155 160 

Pro Pro Leu Pro ser val Lys Lys Leu val Glu Asp Arg Trp Asn Lys 
165 170 175 

Pro Gin Lys Thr Arg Gly His Arg Gly ser His Thr Met Asn Gly His 
180 185 190 
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<210> 113 
<211> 192 
<212> PRT 

<213> Artificial sequence 



<220> 

<223> Least squares center of tree reconstruction of clade C vif protei 
n sequence 

<400> 113 

Met Glu Asn Arg Trp Gin val Leu lie val Trp Gin val Asp Arg Met 

Lys lie Arg Thr Trp Asn Ser Leu val Lys His His Met Tyr val Ser 
20 25 30 

Arg Arg Ala Asn Gly Trp Phe Tyr Arg His His Tyr Glu Ser Arg His 
35 40 45 

Pro Lys val Ser Ser Glu val His lie Pro Leu Gly Asp Ala Arg Leu 
50 55 60 

Val lie Lys Thr Tyr Trp Gly Leu Gin Thr Gly Glu Arg Asp Trp His 
65 70 75 80 

Leu Gly His Gly val Ser lie Glu Trp Arg Leu Arg Arg Tyr Ser Thr 
85 90 95 

Gin Val Asp Pro Gly Leu Ala Asp Gin Leu lie His Met His Tyr Phe 
100 105 110 

Asp Cys Phe Ala Asp Ser Ala lie Arg Lys Ala lie Leu Gly His lie 
115 120 125 

Val lie Pro Arg Cys Asp Tyr Gin Ala Gly His Asn Lys val Gly ser 
130 135 140 

Leu Gin Tyr Leu Ala Leu Thr Ala Leu lie Lys Pro Lys Lys lie Lys 
145 150 155 160 

Pro Pro Leu Pro Ser val Arg Lys Leu val Glu Asp Arg Trp Asn Lys 
165 170 175 

Pro Gin Lys Thr Arg Gly Arg Arg Gly Asn His Thr Met Asn Gly His 
180 185 190 

<210> 114 
<211> 192 
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<212> PRT 

<213> Artificial sequence 



<220> 

<223> Minimum of means center of tree reconstructions of dade c vif pr 
otein sequence 

<400> 114 

Met Glu Asn Arg Trp Gin val Leu lie val Trp Gin val Asp Arg Met 
15 10 15 

Lys lie Arg Thr Trp Asn Ser Leu val Lys His His Met Tyr val ser 
20 25 30 

Arg Arg Ala Asn Gly Trp Phe Tyr Arg His His Tyr Glu ser Arg His 
35 40 45 

Pro Lys Val Ser ser Glu val His lie Pro Leu Gly Asp Ala Arg Leu 
50 55 60 

Val lie Lys Thr Tyr Trp Gly Leu His Thr Gly Glu Arg Asp Trp His 
65 70 75 80 

Leu Gly His Gly val Ser lie Glu Trp Arg Leu Arg Arg Tyr Ser Thr 
85 90 95 

Gin val Asp Pro Gly Leu Ala Asp Gin Leu lie His Met His Tyr Phe 
100 105 110 

Asp Cys Phe Ala Asp Ser Ala lie Arg Lys Ala lie Leu Gly His lie 
115 120 125 

val lie Pro Arg Cys Asp Tyr Gin Ala Gly His Asn Lys Val Gly Ser 
130 135 140 

Leu Gin Tyr Leu Ala Leu Thr Ala Leu lie Lys Pro Lys Lys lie Lys 
145 150 155 160 

Pro Pro Leu Pro ser val Arg Lys Leu val Glu Asp Arg Trp Asn Lys 
165 170 175 

Pro Gin Lys Thr Arg Gly Arg Arg Gly Asn His Thr Met Asn Gly His 
180 185 190 

<210> 115 
<211> 96 
<212> PRT 
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<213> Artificial sequence 



<220> 

<223> Most recent common ancestor reconstructions for dade C vpr prote 
in sequence 

<400> 115 

Met Glu Gin Ala Pro Glu Asp Gin Gly Pro Gin Arg Glu Pro Tyr Asn 
15 10 15 

Glu Trp Thr Leu Glu Leu Leu Glu Glu Leu Lys Gin Glu Ala val Arg 
20 25 30 

His Phe Pro Arg Pro Trp Leu His ser Leu Gly Gin His lie Tyr Glu 
35 40 45 

Thr Tyr Gly Asp Thr Trp Ala Gly val Glu Ala lie lie Arg lie Leu 
50 55 60 

Gin Gin Leu Leu Phe lie His Phe Arg lie Gly cys Gin His Ser Arg 
65 70 75 80 

lie Gly lie lie Arg Gin Arg Arg Ala Arg Asn Gly Ala ser Arg Ser 
85 90 95 

<210> 116 
<211> 96 
<212> PRT 

<213> Artificial sequence 



<220> 

<223> Least squares center of tree reconstruction of clade c vpr protei 
n sequence 

<400> 116 

Met Glu Gin Ala Pro Glu Asp Gin Gly Pro Gin Arg Glu Pro Tyr Asn 
15 10 15 

Glu Trp Thr Leu Glu lie Leu Glu Glu Leu Lys Gin Glu Ala val Arg 
20 25 30 

His Phe Pro Arg Pro Trp Leu His ser Leu Gly Gin Tyr lie Tyr Glu 
35 40 45 

Thr Tyr Gly Asp Thr Trp Thr Gly val Glu Ala Leu lie Arg lie Leu 
50 55 60 
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Gin Gin Leu Leu Phe lie His Phe Arg lie Gly cys Gin His Ser Arg 
65 70 75 80 

lie Gly lie Leu Arg Gin Arg Arg Ala Arg Asn Gly Ala Ser Arg Ser 
85 90 95 

<210> 117 
<211> 96 
<212> PRT 

<213> Artificial sequence 



<220> 

<223> Minimum of means center of tree reconstruction of clade C vpr pro 
tein sequence 

<400> 117 

Met Glu Gin Ala Pro Glu Asp Gin Gly Pro Gin Arg Glu Pro Tyr Asn 
15 10 15 

Glu Trp Thr Leu Glu Leu Leu Glu Glu Leu Lys Gin Glu Ala val Arg 
20 25 30 

His Phe Pro Arg Pro Trp Leu His Ser Leu Gly Gin His lie Tyr Glu 
35 40 45 

Thr Tyr Gly Asp Thr Trp Thr Gly val Glu Ala Leu lie Arg lie Leu 
50 55 60 

Gin Gin Leu Leu Phe lie His Phe Arg lie Gly Cys Gin His Ser Arg 
65 70 75 80 

lie Gly lie Met Arg Gin Arg Arg Ala Arg Asn Gly Ala Ser Arg Ser 
85 90 95 

<210> 118 
<211> 86 
<212> PRT 

<213> Artificial sequence 



<220> 

<223> Most recent common ancestor reconstructions for clade C vpu prote 
in sequence 

<400> 118 

Met Leu Asp Leu lie Ala Arg Val Asp Tyr Arg Leu Gly val Gly Ala 
15 10 15 
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Leu lie val Ala Leu lie lie Ala lie val val Trp Thr lie val Tyr 
20 25 30 

lie Glu Tyr Arg Lys Leu val Arg Gin Arg Lys lie Asp Trp Leu lie 
35 40 45 

Lys Arg lie Arg Glu Arg Ala Glu Asp Ser Gly Asn Glu Ser Asp Gly 
50 55 60 

Asp Thr Glu Glu Leu ser Thr Leu val Asp Met Gly His Leu Arg Leu 
65 70 75 80 

Leu Asp Val Asn Asp Leu 
85 

<210> 119 
<211> 86 
<212> PRT 

<213> Artificial sequence 



<220> 

<223> Least squares and minimum of means center of tree reconstruction 
of clade C vpu protein sequence 

<400> 119 

Met Leu Asp Leu Leu Ala Arg val Asp Tyr Arg Leu Gly val Gly Ala 
15 10 15 

Leu lie val Ala Leu lie lie Ala lie val val Trp Thr lie val Tyr 
20 25 30 

lie Glu Tyr Arg Lys Leu Leu Arg Gin Arg Lys lie Asp Trp Leu lie 
35 40 45 

Lys Arg lie Arg Glu Arg Ala Glu Asp Ser Gly Asn Glu Ser Glu Gly 
50 55 60 

Asp Thr Glu Glu Leu Ser Thr Met val Asp Met Gly His Leu Arg Leu 
65 70 75 80 

Leu Asp val Asn Asp Leu 
85 

<210> 120 
<211> 376 
<212> PRT 
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<213> Artificial sequence 
<220> 

<223> Deduced ancestor env protein sequence 
<400> 120 

Asn Lys Ser Glu Thr Asp Arg Trp Gly Leu Thr Lys Ser ser Thr Thr 
15 10 15 

Thr Thr Thr Ala Ala Pro Thr Ser Ala Pro val ser Glu Lys lie Asp 
20 25 30 

Met val Asn Glu Thr Ser Ser Cys lie Ala Gin Asn Asn Cys Thr Gly 
35 40 45 

Leu Glu Gin Glu Gin Met lie ser Cys Lys Phe Asn Met Thr Gly Leu 
50 55 60 

Lys Arg Asp Lys Thr Lys Glu Tyr Asn Glu Thr Trp Tyr Ser Thr Asp 
65 70 75 80 

Leu val cys Glu Gin Gly Asn ser Thr Asp Asn Glu Ser Arg Cys Tyr 
85 90 95 

Met Asn His Cys Asn Thr ser val lie Gin Glu ser cys Asp Lys His 
100 105 110 

Tyr Trp Asp Thr lie Arg Phe Arg Tyr Cys Ala Pro Pro Gly Tyr Ala 
115 120 125 

Leu Leu Arg Cys Asn Asp Thr Asn Tyr ser Gly Phe Met Pro Lys Cys 
130 135 140 

ser Lys Val val val Ser Ser Cys Thr Arg Met Met Glu Thr Gin Thr 
145 150 155 160 

Ser Thr Trp Phe Gly Phe Asn Gly Thr Arg Ala Glu Asn Arg Thr Tyr 
165 170 175 

lie Tyr Trp His Gly Arg Asp Asn Arg Thr lie lie ser Leu Asn Lys 
180 185 190 

Tyr Tyr Asn Leu Thr Met Lys Cys Arg Arg Pro Gly Asn Lys Thr val 
195 200 205 

Leu Pro val Thr lie Met ser Gly Leu Val Phe His Ser Gin Pro lie 
210 215 220 

Asn Asp Arg Pro Lys Gin Ala Trp Cys Trp Phe Gly Gly Lys Trp Lys 
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225 230 235 240 

Asp Ala lie Lys Glu Val Lys Gin Thr lie Val Lys His Pro Arg Tyr 
245 250 255 

Thr Gly Thr Asn Asn Thr Asp Lys lie Asn Leu Thr Ala Pro Gly Gly 
260 265 270 

Gly Asp Pro Glu val Thr Phe Met Trp Thr Asn cys Arg Gly Glu Phe 
275 280 285 

Leu Tyr Cys Lys Met Asn Trp Phe Leu Asn Trp val Glu Asp Arg Asp 
290 295 300 

Val Thr Thr Gin Arg Pro Lys Glu Arg His Arg Arg Asn Tyr Val Pro 
305 310 315 320 

Cys His lie Arg Gin lie lie Asn Thr Trp His Lys val Gly Lys Asn 
325 330 335 

val Tyr Leu Pro Pro Arg Glu Gly Asp Leu Thr cys Asn ser Thr val 
340 345 350 

Thr ser Leu lie Ala Asn lie Asp Trp Thr Asp Gly Asn Gin Thr Asn 
355 360 365 

lie Thr Met ser Ala Glu val Ala 
370 375 

<210> 121 

<211> 883 

<212> PRT 

<213> Artificial sequence 



<220> 

<223> Deduced ancestor env protein sequence 
<400> 121 

Met Arg val Lys Gly lie Arg Lys Asn Tyr Gin His Leu Trp Arg Trp 
15 10 15 

Gly Thr Met Leu Leu Gly Met Leu Met lie cys ser Ala Ala Glu Lys 
20 25 30 

Leu Trp val Thr val Tyr Tyr Gly Val Pro val Trp Lys Glu Ala Thr 
35 40 45 

Thr Thr Leu Phe Cys Ala ser Asp Ala Lys Ala Tyr Asp Thr Glu val 
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50 55 60 

His Asn val Trp Ala Thr His Ala Cys Val Pro Thr Asp Pro Asn Pro 
65 70 75 80 

Gin Glu Val val Leu Glu Asn val Thr Glu Asn Phe Asn Met Trp Lys 
85 90 95 

Asn Asn Met val Glu Gin Met His Glu Asp lie lie Ser Leu Trp Asp 
100 105 110 

Gin Ser Leu Lys Pro cys val Lys Leu Thr Pro Leu Cys val Thr Leu 
115 120 125 

Asn Cys Thr Asp Asp Leu Arg Thr Asn Ala Thr Asn Thr Thr Asn Ser 
130 135 140 

ser Ala Thr Thr Asn Thr Thr ser ser Gly Gly Gly Thr Met Glu Gly 
145 150 155 160 

Glu Lys Gly Glu He Lys Asn Cys Ser Phe Asn val Thr Thr ser He 
165 170 175 

Arg Asp Lys Met Gin Lys Glu Tyr Ala Leu Phe Tyr Lys Leu Asp val 
180 185 190 

val Pro He Asp Asn Asp Asn Asn Asn Thr Asn Asn Asn Thr Ser Tyr 
195 200 205 

Arg Leu He Asn Cys Asn Thr Ser val He Thr Gin Ala cys Pro Lys 
210 215 220 

Val Ser Phe Glu Pro He Pro He His Tyr Cys Thr Pro Ala Gly Phe 
225 230 235 240 

Ala lie Leu Lys Cys Asn Asp Lys Lys Phe Asn Gly Thr Gly Pro Cys 
245 250 255 

Thr Asn val ser Thr val Gin cys Thr His Gly He Arg Pro val Val 
260 265 270 

ser Thr Gin Leu Leu Leu Asn Gly ser Leu Ala Glu Glu Glu val val 

275 280 285 

He Arg ser Glu Asn Phe Thr Asp Asn Ala Lys Thr lie lie val Gin 
290 295 300 

Leu Asn Glu Ser val Glu lie Asn Cys Thr Arg Pro Asn Asn Asn Thr 
305 310 315 320 

Arg Lys Ser lie Pro lie Gly Pro Gly Arg Ala Leu Tyr Ala Thr Gly 
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325 330 335 

Lys lie lie Gly Asp lie Arg Gin Ala His Cys Asn Leu Ser Arg Ala 
340 345 350 

Lys Trp Asn Asn Thr Leu Lys Gin lie Val Thr Lys Leu Arg Glu Gin 
355 360 365 

Phe Gly Asn Asn Lys Thr Thr lie val Phe Asn Gin ser ser Gly Gly 
370 375 380 

Asp Pro Glu He val Met His Ser Phe Asn cys Gly Gly Glu Phe Phe 
385 390 395 400 

Tyr Cys Asn Ser Thr Gin Leu Phe Asn Ser Thr Trp His Phe Asn Gly 
405 410 415 

Thr Trp Gly Asn Asn Asn Thr Glu Arg Ser Asn Asn Ala Ala Asp Asp 
420 425 430 

Asn Asp Thr lie Thr Leu Pro cys Arg lie Lys Gin lie lie Asn Met 
435 440 445 

Trp Gin Glu val Gly Lys Ala Met Tyr Ala Pro Pro lie ser Gly Gin 
450 455 460 

lie Arg cys ser ser Asn lie Thr Gly Leu Leu Leu Thr Arg Asp Gly 
465 470 475 480 

Gly Asn Asn Glu Asn Thr Asn Asn Thr Asp Thr Glu lie Phe Arg Pro 
485 490 495 

Gly Gly Gly Asp Met Arg Asp Asn Trp Arg Ser Glu Leu Tyr Lys Tyr 
500 505 510 

Lys Val Val Lys lie Glu Pro Leu Gly Val Ala Pro Thr Lys Ala Lys 
515 520 525 

Arg Arg Val val Gin Arg Glu Lys Arg Ala val Gly Met Leu Gly Ala 
530 535 546 

Met Phe Leu Gly Phe Leu Gly Ala Ala Gly ser Thr Met Gly Ala Ala 
545 550 555 560 

Ser Met Thr Leu Thr val Gin Ala Arg Gin Leu Leu Ser Gly lie Val 
565 570 575 

Gin Gin Gin Asn Asn Leu Leu Arg Ala lie Glu Ala Gin Gin His Leu 
580 585 590 

Leu Gin Leu Thr val Trp Gly He Lys Gin Leu Gin Ala Arg val Leu 
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595 600 605 

Ala val Glu Arg Tyr Leu Lys Asp Gin Gin Leu Leu Gly lie Trp Gly 
610 615 620 

Cys ser Gly Lys Leu lie Cys Thr Thr Ala val Pro Trp Asn Ala Ser 
625 630 635 640 

Trp Ser Asn Lys ser Leu Asp Lys lie Trp Asn Asn Met Thr Trp Met 
645 650 655 

Glu Trp Glu Arg Glu lie Asp Asn Tyr Thr Gly Leu lie Tyr Thr Leu 
660 665 670 

lie Glu Glu Ser Gin Asn Gin Gin Glu Lys Asn Glu Gin Glu Leu Leu 
675 680 685 

Glu Leu Asp Lys Trp Ala ser Leu Trp Asn Trp Phe Asp lie Thr Asn 
690 695 700 

Trp Leu Trp Tyr He Lys He Phe He Met He Val Gly Gly Leu val 
705 710 715 720 

Gly Leu Arg lie Val Phe Ala Val Leu Ser lie val Asn Arg Val Arg 
725 730 735 

Gin Gly Tyr Ser Pro Leu Ser Phe Gin Thr Arg Leu Pro Ala Pro Arg 
740 745 750 

Gly Pro Asp Arg Pro Glu Gly lie Glu Glu Glu Gly Gly Glu Arg Asp 
755 760 765 

Arg Asp Arg ser Gly Arg Leu val Asn Gly Phe Leu Ala Leu lie Trp 
770 775 780 

Asp Asp Leu Arg Ser Leu Cys Leu Phe Ser Tyr His Arg Leu Arg Asp 
785 790 795 800 

Leu Leu Leu lie val Ala Arg lie val Glu Leu Leu Gly Arg Arg Gly 
805 810 815 

Trp Glu Ala Leu Lys Tyr Trp Trp Asn Leu Leu Gin Tyr Trp ser Gin 
820 825 830 

Glu Leu Lys Asn Ser Ala val ser Leu Leu Asn Ala Thr Ala lie Ala 
835 840 845 

Val Ala Glu Gly Thr Asp Arg val lie Glu val val Gin Arg Ala Cys 
850 855 860 

Arg Ala lie Leu His lie Pro Arg Arg lie Arg Gin Gly Leu Glu Arg 
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875 



880 



Ala Leu Leu 

<210> 122 

<211> 25 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> First round primer UP-3 

<400> 122 

agactgcaga tgtgaagagg tacac 25 

<210> 123 

<211> 22 

<212> DNA 

<213> Artificial sequence 

<220> 

<223> First round primer PEXTM6 

<400> 123 

ggatctggta tgctcatagc aa 22 



<210> 124 

<211> 25 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Second round primers PEXTM7 

<400> 124 

gatactgcag caacagcaac agctg 

<210> 125 

<211> 24 

<212> DNA 

<213> Artificial sequence 



25 
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<220> 

<223> second round primer UP-5 

<400> 125 

gcaaagcttc tctggttggc agtg 
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